Commit
Β·
509dc63
1
Parent(s):
ea4ac22
update readme and add motivation gif with LFS
Browse files- .gitattributes +1 -0
- README.md +18 -13
- assets/motivation_v2.gif +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
assets/motivation_v2.gif filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,13 +1,12 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
-
- video-editing
|
| 5 |
-
- computer-vision
|
| 6 |
- video-generation
|
|
|
|
| 7 |
- in-context-learning
|
| 8 |
- pytorch
|
| 9 |
pipeline_tag: video-to-video
|
| 10 |
-
library_name:
|
| 11 |
authors:
|
| 12 |
- XiangpengYang
|
| 13 |
- horizonwind2004
|
|
@@ -24,13 +23,12 @@ authors:
|
|
| 24 |
</h4>
|
| 25 |
|
| 26 |
<h4 style="margin: 15px 0; color: #2c3e50;">
|
| 27 |
-
π
|
| 28 |
</h4>
|
| 29 |
|
| 30 |
-
<a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/Paper-PDF-red" alt="Paper"></a>
|
| 31 |
<a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/arXiv-2400.00000-b31b1b.svg" alt="arXiv"></a>
|
| 32 |
<a href="https://videocof.github.io"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
|
| 33 |
-
<a href="https://github.com/
|
| 34 |
|
| 35 |
</div>
|
| 36 |
|
|
@@ -51,17 +49,24 @@ authors:
|
|
| 51 |
|
| 52 |
# VideoCoF: Unified Video Editing with Temporal Reasoner
|
| 53 |
|
| 54 |
-
|
|
|
|
| 55 |
|
| 56 |
<div align="center">
|
| 57 |
-
<
|
| 58 |
-
<
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
</div>
|
| 61 |
|
| 62 |
## π Key Capabilities
|
|
|
|
| 63 |
|
| 64 |
-
1. **
|
| 65 |
2. **Data Efficiency**: Achieves SOTA performance with only **50k training pairs** (33 frames each).
|
| 66 |
3. **Length Extrapolation**: Demonstrates robust multi-shot editing and can generalize to videos **4× longer** than training samples.
|
| 67 |
4. **Versatile Editing**: Supports:
|
|
@@ -72,12 +77,12 @@ authors:
|
|
| 72 |
|
| 73 |
## π§ Quick Start
|
| 74 |
|
| 75 |
-
To use these weights, please refer to the official [GitHub Repository](https://github.com/
|
| 76 |
|
| 77 |
### Installation
|
| 78 |
|
| 79 |
```bash
|
| 80 |
-
git clone [https://github.com/
|
| 81 |
cd VideoCoF
|
| 82 |
|
| 83 |
# Create environment
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
|
|
|
|
|
|
| 4 |
- video-generation
|
| 5 |
+
- video-editing
|
| 6 |
- in-context-learning
|
| 7 |
- pytorch
|
| 8 |
pipeline_tag: video-to-video
|
| 9 |
+
library_name: transformers
|
| 10 |
authors:
|
| 11 |
- XiangpengYang
|
| 12 |
- horizonwind2004
|
|
|
|
| 23 |
</h4>
|
| 24 |
|
| 25 |
<h4 style="margin: 15px 0; color: #2c3e50;">
|
| 26 |
+
π A Chain of Frames editing method enbale temporal reasoning and 4x video length generalization with just 50k training pairs!
|
| 27 |
</h4>
|
| 28 |
|
|
|
|
| 29 |
<a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/arXiv-2400.00000-b31b1b.svg" alt="arXiv"></a>
|
| 30 |
<a href="https://videocof.github.io"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
|
| 31 |
+
<a href="https://github.com/knightyxp/VideoCoF"><img src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github" alt="GitHub"></a>
|
| 32 |
|
| 33 |
</div>
|
| 34 |
|
|
|
|
| 49 |
|
| 50 |
# VideoCoF: Unified Video Editing with Temporal Reasoner
|
| 51 |
|
| 52 |
+
|
| 53 |
+
**VideoCoF** is a unified video editing model that bridges the gap between expert models (precise but restricted) and unified in-context models (flexible but spatially inaccurate). By introducing a **"See → Reason → Edit"**, a Chain-of-Frames paradigm, VideoCoF predicts reasoning tokens before generating the target video tokens, thereby removing the need for user-provided masks while achieving precise instruction to-region alignment.
|
| 54 |
|
| 55 |
<div align="center">
|
| 56 |
+
<a href="https://www.youtube.com/watch?v=3iNUH1Dq9-0" target="_blank">
|
| 57 |
+
<img src="https://img.youtube.com/vi/3iNUH1Dq9-0/maxresdefault.jpg"
|
| 58 |
+
alt="Video Demo"
|
| 59 |
+
width="80%"
|
| 60 |
+
style="max-width:900px; border-radius:10px; box-shadow:0 0 10px rgba(0,0,0,0.15);">
|
| 61 |
+
</a>
|
| 62 |
+
<br>
|
| 63 |
+
<em>Click the image above to watch the full video on YouTube π¬</em>
|
| 64 |
</div>
|
| 65 |
|
| 66 |
## π Key Capabilities
|
| 67 |
+

|
| 68 |
|
| 69 |
+
1. **Temporal Reasoning**: Adopts a unique approach where the model first identifies *where* and *how* to edit (Reasoning) before predicting the target video tokens.
|
| 70 |
2. **Data Efficiency**: Achieves SOTA performance with only **50k training pairs** (33 frames each).
|
| 71 |
3. **Length Extrapolation**: Demonstrates robust multi-shot editing and can generalize to videos **4× longer** than training samples.
|
| 72 |
4. **Versatile Editing**: Supports:
|
|
|
|
| 77 |
|
| 78 |
## π§ Quick Start
|
| 79 |
|
| 80 |
+
To use these weights, please refer to the official [GitHub Repository](https://github.com/knightyxp/VideoCoF) for inference code and environment setup.
|
| 81 |
|
| 82 |
### Installation
|
| 83 |
|
| 84 |
```bash
|
| 85 |
+
git clone [https://github.com/knightyxp/VideoCoF](https://github.com/knightyxp/VideoCoF)
|
| 86 |
cd VideoCoF
|
| 87 |
|
| 88 |
# Create environment
|
assets/motivation_v2.gif
ADDED
|
Git LFS Details
|