## 📖 Introduction


**UnityVideo** is a unified generalist framework for multi-task multi-modal video understanding that enables:

- 🎨 **Text-to-Video Generation**: Create high-quality videos from text descriptions
- 🎮 **Controllable Generation**: Fine-grained control over video generation with various modalities
- 🔍 **Modality Estimation**: Estimate depth, normal, and other modalities from video
- 🌟 **Zero-Shot Generalization**: Strong generalization to novel objects and styles without additional training

Our unified architecture achieves state-of-the-art performance across multiple video generation benchmarks while maintaining efficiency and scalability.

---