Scaling Spatial Intelligence with Multimodal Foundation Models Paper โข 2511.13719 โข Published 29 days ago โข 45
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image Paper โข 2511.13648 โข Published 29 days ago โข 52
Simulating the Visual World with Artificial Intelligence: A Roadmap Paper โข 2511.08585 โข Published Nov 11 โข 29
SyncHuman: Synchronizing 2D and 3D Generative Models for Single-view Human Reconstruction Paper โข 2510.07723 โข Published Oct 9 โข 4
VChain: Chain-of-Visual-Thought for Reasoning in Video Generation Paper โข 2510.05094 โข Published Oct 6 โข 37
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Paper โข 2507.15028 โข Published Jul 20 โข 21
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper โข 2501.03847 โข Published Jan 7 โข 23
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos Paper โข 2412.03079 โข Published Dec 4, 2024 โข 2
EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation Paper โข 2312.02256 โข Published Dec 4, 2023
Disentangled Clothed Avatar Generation from Text Descriptions Paper โข 2312.05295 โข Published Dec 8, 2023
Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion Paper โข 2407.02887 โข Published Jul 3, 2024
VistaDream: Sampling multiview consistent images for single-view scene reconstruction Paper โข 2410.16892 โข Published Oct 22, 2024
You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors Paper โข 2109.00182 โข Published Sep 1, 2021