SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published 13 days ago • 33
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 12 days ago • 223
facebook/metaclip-2-mt5-worldwide-b32 Zero-Shot Image Classification • 0.3B • Updated Nov 12, 2025 • 707 • 8
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 209
Learning Transferable Visual Models From Natural Language Supervision Paper • 2103.00020 • Published Feb 26, 2021 • 22
DeiTFake: Deepfake Detection Model using DeiT Multi-Stage Training Paper • 2511.12048 • Published Nov 15, 2025 • 1
When Deepfake Detection Meets Graph Neural Network:a Unified and Lightweight Learning Framework Paper • 2508.05526 • Published Aug 7, 2025 • 1
Learnable Instance Attention Filtering for Adaptive Detector Distillation Paper • 2603.26088 • Published Mar 27 • 1
Powerful Teachers Matter: Text-Guided Multi-view Knowledge Distillation with Visual Prior Enhancement Paper • 2603.24208 • Published Mar 25 • 1