RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published • 88
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
• 2412.17256
• Published • 47
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper
• 2412.17747
• Published • 32
Outcome-Refining Process Supervision for Code Generation
Paper
• 2412.15118
• Published • 19
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published • 104
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published • 115
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 444
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper
• 2501.12599
• Published • 128
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published • 31
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published • 59