Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models Paper • 2509.06949 • Published Sep 8, 2025 • 55
Focusing by Contrastive Attention: Enhancing VLMs' Visual Reasoning Paper • 2509.06461 • Published Sep 8, 2025 • 19
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning Paper • 2509.03646 • Published Sep 3, 2025 • 32
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 101
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4, 2025 • 252
A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 19
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17, 2025 • 46
ReLearn: Unlearning via Learning for Large Language Models Paper • 2502.11190 • Published Feb 16, 2025 • 30
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24, 2024 • 17