XiangpengYang commited on
Commit
509dc63
Β·
1 Parent(s): ea4ac22

update readme and add motivation gif with LFS

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +18 -13
  3. assets/motivation_v2.gif +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/motivation_v2.gif filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,13 +1,12 @@
1
  ---
2
  license: apache-2.0
3
  tags:
4
- - video-editing
5
- - computer-vision
6
  - video-generation
 
7
  - in-context-learning
8
  - pytorch
9
  pipeline_tag: video-to-video
10
- library_name: generic
11
  authors:
12
  - XiangpengYang
13
  - horizonwind2004
@@ -24,13 +23,12 @@ authors:
24
  </h4>
25
 
26
  <h4 style="margin: 15px 0; color: #2c3e50;">
27
- πŸš€ SOTA performance on VideoCoF-Bench with just 50k training pairs!
28
  </h4>
29
 
30
- <a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/Paper-PDF-red" alt="Paper"></a>
31
  <a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/arXiv-2400.00000-b31b1b.svg" alt="arXiv"></a>
32
  <a href="https://videocof.github.io"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
33
- <a href="https://github.com/videocof/VideoCoF"><img src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github" alt="GitHub"></a>
34
 
35
  </div>
36
 
@@ -51,17 +49,24 @@ authors:
51
 
52
  # VideoCoF: Unified Video Editing with Temporal Reasoner
53
 
54
- **VideoCoF** is a unified video editing model that bridges the gap between expert models (precise but restricted) and unified in-context models (flexible but spatially inaccurate). By introducing a **"See &rarr; Reason &rarr; Edit"** paradigm, VideoCoF predicts reasoning tokens before generating target video tokens, ensuring edits are applied accurately to intended targets.
 
55
 
56
  <div align="center">
57
- <video autoplay muted loop playsinline controls width="100%">
58
- <source src="https://github.com/user-attachments/assets/0e3eafae-3a62-4cd8-bf4a-37d9b280d05d" type="video/mp4">
59
- </video>
 
 
 
 
 
60
  </div>
61
 
62
  ## 🌟 Key Capabilities
 
63
 
64
- 1. **Unified Reasoning**: Adopts a unique approach where the model first identifies *where* and *how* to edit (Reasoning) before performing the edit.
65
  2. **Data Efficiency**: Achieves SOTA performance with only **50k training pairs** (33 frames each).
66
  3. **Length Extrapolation**: Demonstrates robust multi-shot editing and can generalize to videos **4&times; longer** than training samples.
67
  4. **Versatile Editing**: Supports:
@@ -72,12 +77,12 @@ authors:
72
 
73
  ## πŸ”§ Quick Start
74
 
75
- To use these weights, please refer to the official [GitHub Repository](https://github.com/videocof/VideoCoF) for inference code and environment setup.
76
 
77
  ### Installation
78
 
79
  ```bash
80
- git clone [https://github.com/videocof/VideoCoF.git](https://github.com/videocof/VideoCoF.git)
81
  cd VideoCoF
82
 
83
  # Create environment
 
1
  ---
2
  license: apache-2.0
3
  tags:
 
 
4
  - video-generation
5
+ - video-editing
6
  - in-context-learning
7
  - pytorch
8
  pipeline_tag: video-to-video
9
+ library_name: transformers
10
  authors:
11
  - XiangpengYang
12
  - horizonwind2004
 
23
  </h4>
24
 
25
  <h4 style="margin: 15px 0; color: #2c3e50;">
26
+ πŸš€ A Chain of Frames editing method enbale temporal reasoning and 4x video length generalization with just 50k training pairs!
27
  </h4>
28
 
 
29
  <a href="https://arxiv.org/abs/2400.00000"><img src="https://img.shields.io/badge/arXiv-2400.00000-b31b1b.svg" alt="arXiv"></a>
30
  <a href="https://videocof.github.io"><img src="https://img.shields.io/badge/Project-Page-green" alt="Project Page"></a>
31
+ <a href="https://github.com/knightyxp/VideoCoF"><img src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github" alt="GitHub"></a>
32
 
33
  </div>
34
 
 
49
 
50
  # VideoCoF: Unified Video Editing with Temporal Reasoner
51
 
52
+
53
+ **VideoCoF** is a unified video editing model that bridges the gap between expert models (precise but restricted) and unified in-context models (flexible but spatially inaccurate). By introducing a **"See &rarr; Reason &rarr; Edit"**, a Chain-of-Frames paradigm, VideoCoF predicts reasoning tokens before generating the target video tokens, thereby removing the need for user-provided masks while achieving precise instruction to-region alignment.
54
 
55
  <div align="center">
56
+ <a href="https://www.youtube.com/watch?v=3iNUH1Dq9-0" target="_blank">
57
+ <img src="https://img.youtube.com/vi/3iNUH1Dq9-0/maxresdefault.jpg"
58
+ alt="Video Demo"
59
+ width="80%"
60
+ style="max-width:900px; border-radius:10px; box-shadow:0 0 10px rgba(0,0,0,0.15);">
61
+ </a>
62
+ <br>
63
+ <em>Click the image above to watch the full video on YouTube 🎬</em>
64
  </div>
65
 
66
  ## 🌟 Key Capabilities
67
+ ![](assets/motivation_v2.gif)
68
 
69
+ 1. **Temporal Reasoning**: Adopts a unique approach where the model first identifies *where* and *how* to edit (Reasoning) before predicting the target video tokens.
70
  2. **Data Efficiency**: Achieves SOTA performance with only **50k training pairs** (33 frames each).
71
  3. **Length Extrapolation**: Demonstrates robust multi-shot editing and can generalize to videos **4&times; longer** than training samples.
72
  4. **Versatile Editing**: Supports:
 
77
 
78
  ## πŸ”§ Quick Start
79
 
80
+ To use these weights, please refer to the official [GitHub Repository](https://github.com/knightyxp/VideoCoF) for inference code and environment setup.
81
 
82
  ### Installation
83
 
84
  ```bash
85
+ git clone [https://github.com/knightyxp/VideoCoF](https://github.com/knightyxp/VideoCoF)
86
  cd VideoCoF
87
 
88
  # Create environment
assets/motivation_v2.gif ADDED

Git LFS Details

  • SHA256: 2660dfc2cd2ab6a581704d5c6a4b1be937cdd416b4e8680c8969feb537992133
  • Pointer size: 133 Bytes
  • Size of remote file: 67.5 MB