update readme and add motivation gif with LFS

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +18 -13
assets/motivation_v2.gif +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/motivation_v2.gif filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,13 +1,12 @@
 ---
 license: apache-2.0
 tags:
-- video-editing
-- computer-vision
 - video-generation
 - in-context-learning
 - pytorch
 pipeline_tag: video-to-video
-library_name: generic
 authors:
 - XiangpengYang
 - horizonwind2004
@@ -24,13 +23,12 @@ authors:
   </h4>
   <h4 style="margin: 15px 0; color: #2c3e50;">
-    🚀 SOTA performance on VideoCoF-Bench with just 50k training pairs!
   </h4>
-  <a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/Paper-PDF-red" alt="Paper"></a>
   <a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/arXiv-2400.00000-b31b1b.svg" alt="arXiv"></a>
   <a href="https://videocof.github.io"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
-  <a href="https://github.com/videocof/VideoCoF"><img src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github" alt="GitHub"></a>
 </div>
@@ -51,17 +49,24 @@ authors:
 # VideoCoF: Unified Video Editing with Temporal Reasoner
-**VideoCoF** is a unified video editing model that bridges the gap between expert models (precise but restricted) and unified in-context models (flexible but spatially inaccurate). By introducing a **"See &rarr; Reason &rarr; Edit"** paradigm, VideoCoF predicts reasoning tokens before generating target video tokens, ensuring edits are applied accurately to intended targets.
 <div align="center">
-  <video autoplay muted loop playsinline controls width="100%">
-    <source src="https://github.com/user-attachments/assets/0e3eafae-3a62-4cd8-bf4a-37d9b280d05d" type="video/mp4">
-  </video>
 </div>
 ## 🌟 Key Capabilities
-1.  **Unified Reasoning**: Adopts a unique approach where the model first identifies *where* and *how* to edit (Reasoning) before performing the edit.
 2.  **Data Efficiency**: Achieves SOTA performance with only **50k training pairs** (33 frames each).
 3.  **Length Extrapolation**: Demonstrates robust multi-shot editing and can generalize to videos **4&times; longer** than training samples.
 4.  **Versatile Editing**: Supports:
@@ -72,12 +77,12 @@ authors:
 ## 🔧 Quick Start
-To use these weights, please refer to the official [GitHub Repository](https://github.com/videocof/VideoCoF) for inference code and environment setup.
 ### Installation
 ```bash
-git clone [https://github.com/videocof/VideoCoF.git](https://github.com/videocof/VideoCoF.git)
 cd VideoCoF
 # Create environment

 ---
 license: apache-2.0
 tags:
 - video-generation
+- video-editing
 - in-context-learning
 - pytorch
 pipeline_tag: video-to-video
+library_name: transformers
 authors:
 - XiangpengYang
 - horizonwind2004
   </h4>
   <h4 style="margin: 15px 0; color: #2c3e50;">
+    🚀 A Chain of Frames editing method enbale temporal reasoning and 4x video length generalization with just 50k training pairs!
   </h4>
   <a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/arXiv-2400.00000-b31b1b.svg" alt="arXiv"></a>
   <a href="https://videocof.github.io"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
+  <a href="https://github.com/knightyxp/VideoCoF"><img src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github" alt="GitHub"></a>
 </div>
 # VideoCoF: Unified Video Editing with Temporal Reasoner
+**VideoCoF** is a unified video editing model that bridges the gap between expert models (precise but restricted) and unified in-context models (flexible but spatially inaccurate). By introducing a **"See &rarr; Reason &rarr; Edit"**, a Chain-of-Frames paradigm, VideoCoF predicts reasoning tokens before generating the target video tokens, thereby removing the need for user-provided masks while achieving precise instruction to-region alignment.
 <div align="center">
+  <a href="https://www.youtube.com/watch?v=3iNUH1Dq9-0" target="_blank">
+    <img src="https://img.youtube.com/vi/3iNUH1Dq9-0/maxresdefault.jpg"
+         alt="Video Demo"
+         width="80%"
+         style="max-width:900px; border-radius:10px; box-shadow:0 0 10px rgba(0,0,0,0.15);">
+  </a>
+  <br>
+  <em>Click the image above to watch the full video on YouTube 🎬</em>
 </div>
 ## 🌟 Key Capabilities
+![](assets/motivation_v2.gif)
+1.  **Temporal Reasoning**: Adopts a unique approach where the model first identifies *where* and *how* to edit (Reasoning) before predicting the target video tokens.
 2.  **Data Efficiency**: Achieves SOTA performance with only **50k training pairs** (33 frames each).
 3.  **Length Extrapolation**: Demonstrates robust multi-shot editing and can generalize to videos **4&times; longer** than training samples.
 4.  **Versatile Editing**: Supports:
 ## 🔧 Quick Start
+To use these weights, please refer to the official [GitHub Repository](https://github.com/knightyxp/VideoCoF) for inference code and environment setup.
 ### Installation
 ```bash
+git clone [https://github.com/knightyxp/VideoCoF](https://github.com/knightyxp/VideoCoF)
 cd VideoCoF
 # Create environment

assets/motivation_v2.gif ADDED Viewed

Git LFS Details

SHA256: 2660dfc2cd2ab6a581704d5c6a4b1be937cdd416b4e8680c8969feb537992133
Pointer size: 133 Bytes
Size of remote file: 67.5 MB