TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering Paper • 1704.04497 • Published Apr 14, 2017
TGIF: A New Dataset and Benchmark on Animated GIF Description Paper • 1604.02748 • Published Apr 10, 2016
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone Paper • 2307.05463 • Published Jul 11, 2023 • 12
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Paper • 2504.13180 • Published Apr 17, 2025 • 20
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding Paper • 2504.13915 • Published Apr 10, 2025
Enhancing Visual Planning with Auxiliary Tasks and Multi-token Prediction Paper • 2507.15130 • Published Jul 20, 2025
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives Paper • 2311.18259 • Published Nov 30, 2023
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 227
PaperOrchestra: A Multi-Agent Framework for Automated AI Research Paper Writing Paper • 2604.05018 • Published Apr 6 • 2
VQQA: An Agentic Approach for Video Evaluation and Quality Improvement Paper • 2603.12310 • Published Mar 12 • 8