aishiknagar 's Collections RL and Agents
updated
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
• 2505.14146
• Published
• 19
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications
of Agentic AI
Paper
• 2505.19443
• Published
• 15
ARM: Adaptive Reasoning Model
Paper
• 2505.20258
• Published
• 45
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
• 2505.19914
• Published
• 46
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published
• 131
Active-O3: Empowering Multimodal Large Language Models with Active
Perception via GRPO
Paper
• 2505.21457
• Published
• 16
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural
Language and Reinforcement Learning
Paper
• 2505.23754
• Published
• 15
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
• 2505.24864
• Published
• 143
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
• 2506.01939
• Published
• 188
Resa: Transparent Reasoning Models via SAEs
Paper
• 2506.09967
• Published
• 21
Reasoning with Exploration: An Entropy Perspective
Paper
• 2506.14758
• Published
• 31
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention
Paper
• 2506.13585
• Published
• 273
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain
Perspective
Paper
• 2506.14965
• Published
• 50
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
• 2506.15211
• Published
• 39
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
• 2507.10532
• Published
• 90
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems
at Once
Paper
• 2507.10541
• Published
• 30
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
• 2507.06261
• Published
• 67
LLMalMorph: On The Feasibility of Generating Variant Malware using
Large-Language-Models
Paper
• 2507.09411
• Published
• 4
The Imitation Game: Turing Machine Imitator is Length Generalizable
Reasoner
Paper
• 2507.13332
• Published
• 49