ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction Paper • 2511.20937 • Published 12 days ago • 15
Running on Zero Featured 85 SAM3 Video Segmentation 🐠 85 Track and label objects in videos using text prompts or clicks
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation Paper • 2510.14949 • Published Oct 16 • 5
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation Paper • 2510.14949 • Published Oct 16 • 5
bryanzhou008/vit-base-patch16-224-in21k-finetuned-inaturalist Image Classification • 85.8M • Updated Aug 19 • 108 • 1
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making Paper • 2410.07166 • Published Oct 9, 2024 • 3
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge Paper • 2310.15066 • Published Oct 23, 2023 • 1
Non-Sequential Graph Script Induction via Multimedia Grounding Paper • 2305.17542 • Published May 27, 2023 • 1