Quantile Advantage Estimation for Entropy-Safe Reasoning Paper • 2509.22611 • Published Sep 26 • 118
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning Paper • 2508.10433 • Published Aug 14 • 144
CoRT: Code-integrated Reasoning within Thinking Paper • 2506.09820 • Published Jun 11 • 17 • 2
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published May 22 • 58
Enabling Scalable Oversight via Self-Evolving Critic Paper • 2501.05727 • Published Jan 10 • 74
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 99
Qwen2.5-Math Collection Math-specific model series based on Qwen2.5 • 11 items • Updated Jul 21 • 88
Qwen2-Math Collection Math-specific model series based on Qwen2 • 8 items • Updated Jul 21 • 52
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning Paper • 2407.04078 • Published Jul 4, 2024 • 21
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published Jun 19, 2024 • 17