Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.21583

[RL] Text-to-Image

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30

about 20 hours ago

Visual Generation Tuning

Paper • 2511.23469 • Published 12 days ago • 13
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Paper • 2511.10555 • Published 27 days ago • 60
Group Relative Attention Guidance for Image Editing

Paper • 2510.24657 • Published Oct 28 • 25
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16 • 15
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16 • 106
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Paper • 2510.13795 • Published Oct 15 • 56
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15 • 11

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13 • 176
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30
Sparser Block-Sparse Attention via Token Permutation

Paper • 2510.21270 • Published Oct 24 • 24

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18 • 48
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published Oct 19 • 21
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16 • 47
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30

HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video

Paper • 2510.05560 • Published Oct 7 • 7
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Paper • 2510.06217 • Published Oct 7 • 63
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 497
Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30 • 54

[RL] Text-to-Image

Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13 • 176
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30
Sparser Block-Sparse Attention via Token Permutation

Paper • 2510.21270 • Published Oct 24 • 24

about 20 hours ago

Visual Generation Tuning

Paper • 2511.23469 • Published 12 days ago • 13
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space

Paper • 2511.10555 • Published 27 days ago • 60
Group Relative Attention Guidance for Image Editing

Paper • 2510.24657 • Published Oct 28 • 25
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30

RL makes MLLMs see better than SFT

Paper • 2510.16333 • Published Oct 18 • 48
Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

Paper • 2510.16888 • Published Oct 19 • 21
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16 • 47
Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation

Paper • 2510.21583 • Published Oct 24 • 30

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16 • 15
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16 • 106
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs

Paper • 2510.13795 • Published Oct 15 • 56
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15 • 11

HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video

Paper • 2510.05560 • Published Oct 7 • 7
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Paper • 2510.06217 • Published Oct 7 • 63
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 497
Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30 • 54

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs