-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2506.23918
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 78 -
How to Train Your LLM Web Agent: A Statistical Diagnosis
Paper • 2507.04103 • Published • 50
-
Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics
Paper • 2502.11859 • Published -
Does Spatial Cognition Emerge in Frontier Models?
Paper • 2410.06468 • Published • 2 -
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
Paper • 2506.04633 • Published • 19 -
PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
Paper • 2502.08636 • Published
-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 49 -
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
Paper • 2508.02120 • Published • 19 -
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259 -
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 122 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 123
-
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
Voxtral
Paper • 2507.13264 • Published • 29 -
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper • 2507.05566 • Published • 113 -
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Paper • 2507.09477 • Published • 86
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 36 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 80 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 240
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper • 2508.08221 • Published • 49 -
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
Paper • 2508.02120 • Published • 19 -
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 225
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 259 -
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper • 2507.16784 • Published • 122 -
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
Paper • 2510.04618 • Published • 123
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
Voxtral
Paper • 2507.13264 • Published • 29 -
SingLoRA: Low Rank Adaptation Using a Single Matrix
Paper • 2507.05566 • Published • 113 -
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs
Paper • 2507.09477 • Published • 86
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 78 -
How to Train Your LLM Web Agent: A Statistical Diagnosis
Paper • 2507.04103 • Published • 50
-
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers
Paper • 2506.23918 • Published • 89 -
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Paper • 2504.16030 • Published • 36 -
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Paper • 2505.24867 • Published • 80 -
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Paper • 2507.01006 • Published • 240
-
Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics
Paper • 2502.11859 • Published -
Does Spatial Cognition Emerge in Frontier Models?
Paper • 2410.06468 • Published • 2 -
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations
Paper • 2506.04633 • Published • 19 -
PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models
Paper • 2502.08636 • Published