SeeNav-Agent: Enhancing Vision-Language Navigation with Visual Prompt and Step-Level Policy Optimization Paper • 2512.02631 • Published 9 days ago • 8
TV2TV: A Unified Framework for Interleaved Language and Video Generation Paper • 2512.05103 • Published 7 days ago • 14
SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 7 days ago • 19
ProPhy: Progressive Physical Alignment for Dynamic World Simulation Paper • 2512.05564 • Published 6 days ago • 4
COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence Paper • 2512.04563 • Published 7 days ago • 13
Embodied Referring Expression Comprehension in Human-Robot Interaction Paper • 2512.06558 • Published 5 days ago • 2
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators Paper • 2512.06963 • Published 4 days ago • 3
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation Paper • 2512.08186 • Published 2 days ago • 17
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment Paper • 2512.06628 • Published 4 days ago • 12
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published 3 days ago • 36
Reflection Removal through Efficient Adaptation of Diffusion Transformers Paper • 2512.05000 • Published 7 days ago • 14
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published Apr 22 • 22
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning Paper • 2511.19900 • Published 16 days ago • 46
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image Paper • 2511.13648 • Published 24 days ago • 52
GigaWorld-0: World Models as Data Engine to Empower Embodied AI Paper • 2511.19861 • Published 16 days ago • 30
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published 17 days ago • 27
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published 17 days ago • 30
Computer-Use Agents as Judges for Generative User Interface Paper • 2511.15567 • Published 22 days ago • 51
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models Paper • 2510.25889 • Published Oct 29 • 64