Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 8 days ago • 83
view article Article PaliGemma – Google's Cutting-Edge Open Vision Language Model +1 May 14, 2024 • 277
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries Paper • 2502.16636 • Published Feb 23 • 1
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation Paper • 2510.17354 • Published Oct 20 • 33
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published Nov 10, 2024 • 16
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29 • 140
GRACE: Generative Representation Learning via Contrastive Policy Optimization Paper • 2510.04506 • Published Oct 6 • 10
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window Paper • 2510.08276 • Published Oct 9 • 9
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs Paper • 2510.09201 • Published Oct 10 • 49
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers Paper • 2311.17136 • Published Nov 28, 2023 • 8
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper • 2505.19253 • Published May 25 • 32
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 38
REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented Generation Paper • 2508.08149 • Published Aug 11 • 2
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 189
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4 • 75
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 238
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7 • 141