-
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Paper • 2505.14604 • Published • 23 -
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
Training Step-Level Reasoning Verifiers with Formal Verification Tools
Paper • 2505.15960 • Published • 7 -
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Paper • 2505.15134 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2505.14652
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models
Paper • 2503.02324 • Published -
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
Paper • 2504.00829 • Published -
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
Paper • 2504.02546 • Published • 2
-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Paper • 2407.18219 • Published • 3 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27 -
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Paper • 2409.04787 • Published • 1
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Paper • 2505.14604 • Published • 23 -
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
Training Step-Level Reasoning Verifiers with Formal Verification Tools
Paper • 2505.15960 • Published • 7 -
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Paper • 2505.15134 • Published • 6
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 510 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 3 -
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models
Paper • 2503.02324 • Published -
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study
Paper • 2504.00829 • Published -
GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning
Paper • 2504.02546 • Published • 2
-
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Recursive Introspection: Teaching Language Model Agents How to Self-Improve
Paper • 2407.18219 • Published • 3 -
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Paper • 2408.16293 • Published • 27 -
Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models
Paper • 2409.04787 • Published • 1