From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 66
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 125
4DNeX: Feed-Forward 4D Generative Modeling Made Easy Paper • 2508.13154 • Published Aug 18, 2025 • 62
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published Aug 18, 2025 • 25
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation Paper • 2503.21979 • Published Mar 27, 2025 • 4
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Paper • 2503.20785 • Published Mar 26, 2025 • 22