Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation Paper • 2512.11792 • Published 14 days ago • 9
Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation Paper • 2512.11792 • Published 14 days ago • 9
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 20
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment Paper • 2411.17188 • Published Nov 26, 2024 • 20
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1, 2024 • 23
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25, 2024 • 17
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Paper • 2303.11897 • Published Mar 21, 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception Paper • 2303.02153 • Published Mar 3, 2023
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1, 2024 • 23
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25, 2024 • 17