A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers Paper • 2508.21148 • Published Aug 28 • 140
SPARK: Synergistic Policy And Reward Co-Evolving Framework Paper • 2509.22624 • Published Sep 26 • 17
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 109
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion Paper • 2509.01215 • Published Sep 1 • 50
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Paper • 2505.23764 • Published May 29 • 3
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings Paper • 2506.04997 • Published Jun 5
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper • 2507.19478 • Published Jul 25 • 31
GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs Paper • 2506.00991 • Published Jun 1
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 208
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering Paper • 2507.08776 • Published Jul 11 • 54
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published Jun 20 • 64
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 68
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published Mar 25 • 35
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18 • 48
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 36
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 74