VideoFlexTok: Flexible-Length Coarse-to-Fine Video Tokenization Paper • 2604.12887 • Published 1 day ago
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 1 day ago • 2
Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2604.12374 • Published 1 day ago • 4
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences? Paper • 2604.10718 • Published 3 days ago • 2
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 2 days ago • 108
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators Paper • 2604.11805 • Published 2 days ago • 13
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 5 days ago • 42
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Paper • 2604.09201 • Published 5 days ago • 2
ELT: Elastic Looped Transformers for Visual Generation Paper • 2604.09168 • Published 5 days ago • 18
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published 5 days ago • 8
view post Post 122 OpenAI is hiring for SLAM Engineers!And open-source shouldn't lag behind.It's pretty hard and necessary problem required to be solved for bringing generalisable robots in real-world.We are pushing out first deep down & will be open-sourcing stuff in the next releases. Hope everyone is ready! Cheers to HF & more hugs.Find us at https://x.com/fpv_labs/status/2042585804162371713 See translation 🤗 1 1 + Reply
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 7 days ago • 15
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models Paper • 2604.08340 • Published 6 days ago • 8
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 6 days ago • 273
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 6 days ago • 41
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 6 days ago • 46
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling Paper • 2604.07209 • Published 7 days ago • 35