Vision - a diwank Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

diwank 's Collections

M

world

Med

code

F

search

Vision

Art

K

S1.1

Sam

Audio

thought

Vision

updated 11 days ago

apple/DepthPro

Depth Estimation • Updated Feb 28 • 19.5k • 488
rhymes-ai/Aria

Image-Text-to-Text • 25B • Updated Apr 23 • 43.2k • 637
mit-han-lab/hart-0.7b-1024px

Unconditional Image Generation • Updated Nov 17, 2024 • 13
deepseek-ai/Janus-1.3B

Any-to-Any • 2B • Updated Jan 27 • 7.45k • 592
bingbangboom/flux-film-camera

Text-to-Image • Updated Nov 15, 2024 • 27 • • 28
neulab/PangeaInstruct

Updated Feb 2 • 242 • 86
genmo/mochi-1-preview

Text-to-Video • Updated Sep 4 • 3.6k • • 1.3k
stabilityai/stable-diffusion-3.5-large

Text-to-Image • Updated Oct 22, 2024 • 50.6k • • 3.26k
Freepik/flux.1-lite-8B-alpha

Text-to-Image • Updated Dec 30, 2024 • 445 • 427
microsoft/OmniParser

Image-Text-to-Text • Updated Dec 2, 2024 • 497 • 1.7k
mistralai/Pixtral-12B-Base-2409

Updated Jul 28 • 32 • 105
neulab/Pangea-7B

8B • Updated Oct 24, 2024 • 6.28k • 131
jadechoghari/Ferret-UI-Llama8b

Image-Text-to-Text • 8B • Updated Jan 8 • 279 • 68
OpenGVLab/InternVL2-1B

Image-Text-to-Text • 0.9B • Updated Mar 25 • 472k • 77
OpenGVLab/InternVL2-2B

Image-Text-to-Text • 2B • Updated Mar 25 • 1.14M • 76
OpenGVLab/Mono-InternVL-2B

Image-Text-to-Text • 3B • Updated Jul 22 • 8.68k • 36
OpenGVLab/OmniCorpus-YT

Updated Mar 20 • 746 • 13
OpenGVLab/OmniCorpus-CC-210M

Viewer • Updated Mar 20 • 208M • 252 • 32
OpenGVLab/OmniCorpus-CC

Viewer • Updated Mar 20 • 872M • 19.6k • 22
OpenGVLab/InternVideo2_chat_8B_HD

Video-Text-to-Text • 8B • Updated Dec 18, 2024 • 106 • 18
OpenGVLab/ViCLIP

Updated Jun 7, 2024 • 45
OpenGVLab/ASMv2

Text Generation • Updated Feb 29, 2024 • 253 • 16
OpenGVLab/VideoChat2-IT

Viewer • Updated Jun 29, 2024 • 1.82M • 389 • 51
NimVideo/cogvideox-2b-img2vid

Image-to-Video • Updated Oct 28, 2024 • 190 • 80
BAAI/Infinity-MM

Updated Dec 13, 2024 • 4.96k • 113
nvidia/RADIO-H

0.7B • Updated Jul 4 • 103 • 10
Spawning/PD12M

Viewer • Updated Jan 9 • 12.4M • 2.55k • 170
Shitao/OmniGen-v1

Text-to-Image • Updated Nov 7, 2024 • 1.47k • 321
InstantX/InstantIR

Image-to-Image • Updated Nov 7, 2024 • 2 • 180
nvidia/Cosmos-0.1-Tokenizer-DI8x8

Updated Dec 25, 2024 • 151 • 11
BAAI/Emu3-Chat

Text Generation • 8B • Updated Oct 24, 2024 • 644 • 73
briaai/RMBG-2.0

Image Segmentation • 0.2B • Updated 20 days ago • 287k • • 960
Watermark Anything with Localized Messages

Paper • 2411.07231 • Published Nov 11, 2024 • 21
rain1011/pyramid-flow-miniflux

Text-to-Video • Updated Nov 13, 2024 • 176
OpenGVLab/InternVL2-8B-MPO

Image-Text-to-Text • 8B • Updated Dec 20, 2024 • 112 • 37
mistralai/Pixtral-Large-Instruct-2411

Updated Jul 28 • 149 • 426
briaai/BRIA-2.3

Text-to-Image • Updated Apr 10 • 81 • 38
microsoft/Reducio-VAE

Updated Nov 21, 2024 • 42 • 17
Lightricks/LTX-Video

Image-to-Video • Updated Jul 16 • 282k • • 2.06k
apple/aimv2-3B-patch14-448

Image Feature Extraction • 3B • Updated Jul 8 • 112 • 13
THUdyh/Insight-V-Reason

Text Generation • 8B • Updated Nov 22, 2024 • 12 • 9
black-forest-labs/FLUX.1-Fill-dev

Updated Jun 27 • 150k • 964
Efficient-Large-Model/Sana_1600M_512px

Text-to-Image • Updated Jan 10 • 386 • 39
Efficient-Large-Model/Sana_1600M_1024px

Text-to-Image • Updated Oct 28 • 367 • • 215
AIDC-AI/Ovis1.6-Gemma2-27B

Image-Text-to-Text • 29B • Updated Feb 26 • 129 • 62
HuggingFaceTB/SmolVLM-Base

Image-Text-to-Text • 2B • Updated Nov 28, 2024 • 4.18k • 84
zai-org/glm-edge-v-5b

Image-Text-to-Text • 5B • Updated Jan 2 • 129 • 12
rhymes-ai/Aria-Base-64K

Image-Text-to-Text • 25B • Updated Dec 1, 2024 • 24 • 14
allenai/pixmo-point-explanations

Viewer • Updated Dec 5, 2024 • 79.6k • 161 • 9
tencent/HunyuanVideo

Text-to-Video • Updated Mar 6 • 1.19k • • 2.08k
tencent/HunyuanVideo-PromptRewrite

Updated Dec 6, 2024 • 220 • 52
google/paligemma2-28b-pt-896

Image-Text-to-Text • 28B • Updated Dec 5, 2024 • 236 • 50
OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • 78B • Updated Sep 11 • 521 • 192
MAmmoTH-VL/MAmmoTH-VL-8B

8B • Updated Dec 9, 2024 • 20 • 19
MAmmoTH-VL/MAmmoTH-VL-Instruct-12M

Viewer • Updated Jan 5 • 37M • 2.34k • 63
OpenGVLab/PVC-InternVL2-8B

Image-Text-to-Text • 10B • Updated Dec 17, 2024 • 74 • 9
BGLab/BioTrove

Viewer • Updated Dec 13, 2024 • 163M • 1.16k • 17
TencentARC/NVComposer

Image-to-3D • Updated Dec 16, 2024 • 55 • 7
deepseek-ai/deepseek-vl2

Image-Text-to-Text • 27B • Updated Dec 18, 2024 • 3.9k • 371
FastVideo/FastHunyuan

Text-to-Video • Updated Jan 8 • 57 • 191
BAAI/nova-d48w1536-sdxl1024

Text-to-Image • Updated Dec 21, 2024 • 14 • 7
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated Dec 25, 2024 • 286 • 610
Infinigence/Megrez-3B-Omni

4B • Updated Feb 14 • 25 • 135
microsoft/VidTok

Updated Apr 5 • 42
TIGER-Lab/Mantis-8B-siglip-llama3

Image-to-Text • 8B • Updated Nov 15, 2024 • 492 • 33
OpenGVLab/HoVLE-HD

Image-Text-to-Text • 3B • Updated Feb 9 • 66 • 8
nyu-visionx/cambrian-34b

Text Generation • 35B • Updated Jun 28, 2024 • 23 • 27
nyu-visionx/cambrian-phi3-3b

Text Generation • 4B • Updated Jul 6, 2024 • 217 • 11
nyu-visionx/Cambrian-Alignment

Viewer • Updated Jul 23, 2024 • 292k • 8.11k • 38
nvidia/Cosmos-1.0-Autoregressive-13B-Video2World

Updated Feb 8 • 52 • 32
nvidia/Cosmos-1.0-Diffusion-14B-Video2World

Updated May 7 • 1.72k • 56
nvidia/Cosmos-1.0-Diffusion-14B-Text2World

Updated May 7 • 1.88k • 60
nvidia/Cosmos-1.0-Autoregressive-12B

Updated Feb 11 • 43 • 30
StephanST/WALDO30

Object Detection • Updated Jun 23 • 243
ByteDance/Sa2VA-8B

Image-Text-to-Text • 8B • Updated Sep 8 • 609 • 65
OpenGVLab/VideoChat-Flash-Qwen2_5-2B_res448

Video-Text-to-Text • 2B • Updated Mar 16 • 2.17k • 26
OpenGVLab/VideoMAEv2-giant

Video Classification • 1B • Updated Feb 25 • 4.12k • 4
MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • 456B • Updated Jul 3 • 91.5k • 280
NimVideo/mochi-1-transformer-42

Text-to-Video • Updated Jan 13 • 29 • 3
ostris/Flex.1-alpha

Text-to-Image • Updated Jan 19 • 1.23k • 481
tencent/Hunyuan3D-2

Image-to-3D • Updated Oct 17 • 80.7k • 1.68k
deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated Feb 1 • 8.02k • 465
deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1 • 59.8k • 3.54k
Qwen/Qwen2.5-VL-72B-Instruct

Image-Text-to-Text • 73B • Updated Jun 6 • 117k • • 569
nvidia/Eagle2-9B

Image-Text-to-Text • 9B • Updated Jan 28 • 360 • 62
m-a-p/PIN-200M

Viewer • Updated about 13 hours ago • 68.1k • 91.4k • 20
AIDC-AI/Ovis2-34B

Image-Text-to-Text • 35B • Updated Aug 15 • 52.5k • 151
microsoft/OmniParser-v2.0

Updated Mar 28 • 889 • 1.31k
Alpha-VLLM/Lumina-Image-2.0

Text-to-Image • Updated Mar 30 • 1.69k • • 348
prithivMLmods/JSONify-Flux

Image-Text-to-Text • 2B • Updated Feb 16 • 9 • 3
Skywork/SkyReels-V1-Hunyuan-I2V

Image-to-Video • Updated Feb 24 • 597 • • 274
Skywork/SkyReels-A1

Image-to-Video • Updated Mar 4 • 37 • 64
AIDC-AI/Ovis2-16B

Image-Text-to-Text • 16B • Updated Aug 15 • 10.6k • 101
curateIT/themet_openaccess_bestof

Viewer • Updated Apr 7, 2024 • 1.77k • 15 • 1
MnLgt/yolo-human-parse

Image Classification • Updated Sep 19, 2024 • 27 • 11
google/paligemma2-3b-mix-448

Image-Text-to-Text • 3B • Updated Feb 7 • 5.62k • 53
google/paligemma2-28b-mix-448

Image-Text-to-Text • 28B • Updated Feb 7 • 68 • 27
HuggingFaceTB/SmolVLM2-2.2B-Instruct

Image-Text-to-Text • 2B • Updated Apr 8 • 145k • 287
Wan-AI/Wan2.1-T2V-14B

Text-to-Video • Updated Mar 12 • 30.3k • • 1.43k
allenai/olmOCR-7B-0225-preview

Image-to-Text • 8B • Updated Aug 19 • 7.05k • 703
microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated May 1 • 396k • 1.55k
briaai/BRIA-4B-Adapt

Text-to-Image • Updated Jun 11 • 225 • 8
DAMO-NLP-SG/VideoLLaMA3-7B

Video-Text-to-Text • 8B • Updated Sep 2 • 88.6k • 71
ali-vilab/ACE_Plus

Updated Mar 14 • 63 • 293
ByteDance/LatentSync-1.5

Updated Jun 12 • 79.7k • 83
IDEA-Research/RexSeek-3B

Image-Text-to-Text • 4B • Updated Mar 14 • 488 • 10
TIGER-Lab/Vamba-Qwen2-VL-7B

Video-Text-to-Text • 11B • Updated Mar 18 • 96 • 16
docling-project/SmolDocling-256M-preview

Image-Text-to-Text • 0.3B • Updated Sep 17 • 126k • 1.6k
nvidia/Cosmos-Predict1-14B-Video2World

Updated Apr 8 • 56 • 4
nvidia/Cosmos-Transfer1-7B

Updated 21 days ago • 1.37k • 58
CohereLabs/aya-vision-32b

Image-Text-to-Text • 33B • Updated Oct 30 • 140 • • 217
ByteDance/Sa2VA-26B

Image-Text-to-Text • 26B • Updated Sep 8 • 61 • 31
ChaolongYang/KDTalker

Image-to-Video • Updated Mar 30 • 13
Rapidata/OpenAI-4o_t2i_human_preference

Viewer • Updated Mar 28 • 13k • 509 • 34
McGill-NLP/AURORA

Image-to-Image • Updated Dec 21, 2024 • 18 • 4
HiDream-ai/MotionPro

Image-to-Video • Updated May 27 • 87
RaphaelLiu/Pusa-V0.5

Updated Jul 23 • 84 • 46
OpenGVLab/InternVL3-38B

Image-Text-to-Text • 38B • Updated Sep 11 • 55.7k • 43
ShoufaChen/PixelFlow-Text2Image

Text-to-Image • Updated Apr 12 • 13
FoundationVision/Infinity

Updated Jun 24 • 46 • 61
nvidia/PhysicalAI-SmartSpaces

Updated Oct 19 • 6.42k • 56
nvidia/DAM-3B-Video

Image-Text-to-Text • Updated May 7 • 3.26k • 57
nvidia/DAM-3B-Self-Contained

Image-Text-to-Text • Updated May 7 • 717 • 24
OpenGVLab/VideoChat-R1_7B

Video-Text-to-Text • 8B • Updated Apr 22 • 487 • 8
Skywork/SkyCaptioner-V1

Video-Text-to-Text • 8B • Updated Apr 25 • 300 • 49
Fintor/Fintor-GUI-S2

Image-Text-to-Text • 8B • Updated Apr 24 • 19 • 4
ByteDance-Seed/UI-TARS-7B-DPO

Image-Text-to-Text • 8B • Updated Jan 25 • 1.29k • 221
OpenGVLab/InternVL_2_5_HiCo_R64

Video-Text-to-Text • 8B • Updated May 13 • 83 • 3
ByteDance/Q-Insight

Updated May 29 • 15
osunlp/UGround-V1-7B

Image-Text-to-Text • 8B • Updated Apr 16 • 642 • 19
echo840/MonkeyOCR

Image-Text-to-Text • Updated Aug 28 • 703 • 512
showlab/show-o2-7B

Any-to-Any • Updated Sep 5 • 129 • 15
ETH-CVG/lightglue_disk

Keypoint Detection • 13.6M • Updated Jul 17 • 9.99k • 13
TencentARC/ARC-Hunyuan-Video-7B

Video-Text-to-Text • 9B • Updated Sep 19 • 558 • 30
Skywork/Matrix-3D

Image-to-3D • Updated Sep 2 • 49
LiquidAI/LFM2-VL-1.6B

Image-Text-to-Text • 2B • Updated 5 days ago • 3.21k • 212
nvidia/VideoITG-8B

Image-Text-to-Text • 8B • Updated Aug 13 • 276 • 7
allenai/olmOCR-7B-0725

Image-Text-to-Text • 8B • Updated Aug 26 • 1.16k • 62
internlm/Intern-S1-mini

Image-Text-to-Text • 9B • Updated Oct 31 • 3.42k • 102
AIDC-AI/Ovis2.5-9B

Image-Text-to-Text • 9B • Updated Oct 24 • 10.9k • 298
openbmb/MiniCPM-V-4_5

Image-Text-to-Text • 9B • Updated Oct 10 • 49.5k • 1.02k
apple/FastVLM-7B

Text Generation • 8B • Updated Sep 3 • 737 • 263
apple/MobileCLIP2-S3

Updated Oct 9 • 53 • 4
apple/MobileCLIP2-S2

Updated Oct 9 • 91 • 9
inclusionAI/UI-Venus-Ground-72B

Image-Text-to-Text • 73B • Updated Aug 19 • 706 • 11
PaddlePaddle/PP-OCRv5_mobile_det

Image-to-Text • Updated Jul 22 • 86.2k • 16
Hcompany/Holo1.5-72B

Image-Text-to-Text • 73B • Updated Sep 24 • 58 • 25
facebook/map-anything

Image-to-3D • 0.6B • Updated Sep 22 • 62.4k • 50
YannQi/R-4B

Image-Text-to-Text • 5B • Updated Sep 4 • 53.9k • 172
decart-ai/Lucy-Edit-Dev

Video-to-Video • Updated 20 days ago • 465 • 311
TencentARC/ARC-Qwen-Video-7B-Narrator

Video-Text-to-Text • 9B • Updated Sep 21 • 47 • 7
manycore-research/SpatialLM1.1-Qwen-0.5B

Text Generation • 0.6B • Updated Sep 23 • 8.81k • 25
PerceptronAI/Isaac-0.1

Text Generation • 3B • Updated Oct 9 • 4.22k • 112
internlm/CapRL-3B

Image-Text-to-Text • 4B • Updated Oct 22 • 374 • 45
nvidia/Audio2Face-3D-v3.0

Updated Oct 21 • 221 • 48
nvidia/nemotron-table-structure-v1

Object Detection • Updated 19 days ago • 153 • 19
datalab-to/chandra

Image-to-Text • 9B • Updated Oct 21 • 89.3k • 409
allenai/olmOCR-2-7B-1025

Image-to-Text • 8B • Updated Oct 22 • 33.5k • 89
stepfun-ai/GELab-Zero-4B-preview

Image-to-Text • 4B • Updated 9 days ago • 796 • 92

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs