MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization Paper • 2603.12743 • Published 5 days ago • 3 • 3
SNCE: Geometry-Aware Supervision for Scalable Discrete Image Generation Paper • 2603.15150 • Published 2 days ago • 2
sebis at ArchEHR-QA 2026: How Much Can You Do Locally? Evaluating Grounded EHR QA on a Single Notebook Paper • 2603.13962 • Published 4 days ago • 2
Garments2Look: A Multi-Reference Dataset for High-Fidelity Outfit-Level Virtual Try-On with Clothing and Accessories Paper • 2603.14153 • Published 4 days ago • 2 • 3
Spectrum Matching: a Unified Perspective for Superior Diffusability in Latent Diffusion Paper • 2603.14645 • Published 3 days ago • 4 • 2
VoXtream2: Full-stream TTS with dynamic speaking rate control Paper • 2603.13518 • Published 5 days ago • 1 • 2
Tri-Prompting: Video Diffusion with Unified Control over Scene, Subject, and Motion Paper • 2603.15614 • Published 2 days ago • 4 • 2
Towards Generalizable Robotic Manipulation in Dynamic Environments Paper • 2603.15620 • Published 2 days ago • 3 • 2
HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions Paper • 2603.15612 • Published 2 days ago • 133 • 2
Make it SING: Analyzing Semantic Invariants in Classifiers Paper • 2603.14610 • Published 3 days ago • 16 • 2
When Does Sparsity Mitigate the Curse of Depth in LLMs Paper • 2603.15389 • Published 2 days ago • 5 • 2
Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models Paper • 2603.14313 • Published 3 days ago • 3 • 2
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published 4 days ago • 9 • 2
Efficient Document Parsing via Parallel Token Prediction Paper • 2603.15206 • Published 2 days ago • 3 • 2
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Paper • 2603.15478 • Published 2 days ago • 23 • 2