-
Generative Powers of Ten
Paper • 2312.02149 • Published • 8 -
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Paper • 2312.01409 • Published • 11 -
SANeRF-HQ: Segment Anything for NeRF in High Quality
Paper • 2312.01531 • Published • 8 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15
Min-Jung Kim
emjay73
AI & ML interests
Computer Vision, 3D, NeRF
Recent Activity
upvoted
a
paper
about 1 month ago
PHUMA: Physically-Grounded Humanoid Locomotion Dataset
upvoted
a
paper
about 2 months ago
RL makes MLLMs see better than SFT
upvoted
a
paper
5 months ago
DesignLab: Designing Slides Through Iterative Detection and Correction
Organizations
None yet
Feature
video generation
-
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Paper • 2312.04483 • Published • 7 -
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper • 2312.03793 • Published • 18 -
Photorealistic Video Generation with Diffusion Models
Paper • 2312.06662 • Published • 24 -
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Paper • 2312.07509 • Published • 12
3D Recon
-
GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
Paper • 2312.11458 • Published • 5 -
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Paper • 2312.11392 • Published • 20 -
LangSplat: 3D Language Gaussian Splatting
Paper • 2312.16084 • Published • 16 -
Human101: Training 100+FPS Human Gaussians in 100s from 1 View
Paper • 2312.15258 • Published • 10
2D Recognition
Data
-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Paper • 2407.06358 • Published • 19 -
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
Paper • 2412.07825 • Published • 12 -
Sekai: A Video Dataset towards World Exploration
Paper • 2506.15675 • Published • 65
4D generation
-
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Paper • 2403.12365 • Published • 11 -
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Paper • 2407.11398 • Published • 10 -
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Paper • 2407.17470 • Published • 16
4D Perception
3D Animatable Face
LLM service
3D Edit
-
SANeRF-HQ: Segment Anything for NeRF in High Quality
Paper • 2312.01531 • Published • 8 -
Segment Any 3D Gaussians
Paper • 2312.00860 • Published • 10 -
NeRFiller: Completing Scenes via Generative 3D Inpainting
Paper • 2312.04560 • Published • 12 -
ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields
Paper • 2401.17895 • Published • 16
3D generation
-
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image
Paper • 2312.04543 • Published • 22 -
Stable Score Distillation for High-Quality 3D Generation
Paper • 2312.09305 • Published • 10 -
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper • 2312.11461 • Published • 19 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 15
architecture
-
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Paper • 2312.05605 • Published • 3 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 39 -
Rethinking Patch Dependence for Masked Autoencoders
Paper • 2401.14391 • Published • 26 -
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper • 2401.14404 • Published • 18
multimodal
-
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47 -
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper • 2312.14233 • Published • 17 -
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Paper • 2405.18669 • Published • 12 -
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Paper • 2506.09344 • Published • 28
2D Perception
VideoEdit
Tracking
Optimization
Audio generation
4D Recon
2D generation
-
Generative Powers of Ten
Paper • 2312.02149 • Published • 8 -
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Paper • 2312.01409 • Published • 11 -
SANeRF-HQ: Segment Anything for NeRF in High Quality
Paper • 2312.01531 • Published • 8 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 15
3D Edit
-
SANeRF-HQ: Segment Anything for NeRF in High Quality
Paper • 2312.01531 • Published • 8 -
Segment Any 3D Gaussians
Paper • 2312.00860 • Published • 10 -
NeRFiller: Completing Scenes via Generative 3D Inpainting
Paper • 2312.04560 • Published • 12 -
ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields
Paper • 2401.17895 • Published • 16
Feature
3D generation
-
HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image
Paper • 2312.04543 • Published • 22 -
Stable Score Distillation for High-Quality 3D Generation
Paper • 2312.09305 • Published • 10 -
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Paper • 2312.11461 • Published • 19 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 15
video generation
-
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Paper • 2312.04483 • Published • 7 -
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper • 2312.03793 • Published • 18 -
Photorealistic Video Generation with Diffusion Models
Paper • 2312.06662 • Published • 24 -
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Paper • 2312.07509 • Published • 12
architecture
-
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Paper • 2312.05605 • Published • 3 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 39 -
Rethinking Patch Dependence for Masked Autoencoders
Paper • 2401.14391 • Published • 26 -
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper • 2401.14404 • Published • 18
3D Recon
-
GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis
Paper • 2312.11458 • Published • 5 -
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Paper • 2312.11392 • Published • 20 -
LangSplat: 3D Language Gaussian Splatting
Paper • 2312.16084 • Published • 16 -
Human101: Training 100+FPS Human Gaussians in 100s from 1 View
Paper • 2312.15258 • Published • 10
multimodal
-
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47 -
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper • 2312.14233 • Published • 17 -
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Paper • 2405.18669 • Published • 12 -
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Paper • 2506.09344 • Published • 28
2D Recognition
2D Perception
Data
-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions
Paper • 2407.06358 • Published • 19 -
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark
Paper • 2412.07825 • Published • 12 -
Sekai: A Video Dataset towards World Exploration
Paper • 2506.15675 • Published • 65
VideoEdit
4D generation
-
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Paper • 2403.12365 • Published • 11 -
Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
Paper • 2407.11398 • Published • 10 -
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Paper • 2407.17470 • Published • 16
Tracking
4D Perception
Optimization
3D Animatable Face
Audio generation
LLM service
4D Recon