MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published 19 days ago • 21
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published 15 days ago • 93
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published 12 days ago • 48
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought Paper • 2601.23184 • Published 18 days ago • 35
AgentOCR: Reimagining Agent History via Optical Self-Compression Paper • 2601.04786 • Published Jan 8 • 29
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 256
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 145
Slow Perception: Let's Perceive Geometric Figures Step-by-step Paper • 2412.20631 • Published Dec 30, 2024 • 15
Document AI Collection All the papers that can fundementally help in creating a true open-source processing pipeline. • 1 item • Updated Nov 11, 2024 • 1
Focus Anywhere for Fine-grained Multi-page Document Understanding Paper • 2405.14295 • Published May 23, 2024 • 1
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated Dec 23, 2025 • 86
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 57