EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning Paper • 2509.20360 • Published Sep 24 • 17
Video-P2P: Video Editing with Cross-attention Control Paper • 2303.04761 • Published Mar 8, 2023 • 2
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Paper • 2310.01506 • Published Oct 2, 2023
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper • 2402.19299 • Published Feb 29, 2024 • 2
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47
Multi-modal Cooking Workflow Construction for Food Recipes Paper • 2008.09151 • Published Aug 20, 2020 • 1