-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
Neural Network Diffusion
Paper • 2402.13144 • Published • 99 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper • 2402.13251 • Published • 15
Collections
Discover the best community collections!
Collections including paper arxiv:2405.20204
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
RMT: Retentive Networks Meet Vision Transformers
Paper • 2309.11523 • Published • 33 -
Guiding a Diffusion Model with a Bad Version of Itself
Paper • 2406.02507 • Published • 17 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper • 2405.20204 • Published • 37
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper • 2306.17107 • Published • 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper • 2305.07895 • Published • 1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper • 2402.13232 • Published • 16 -
Neural Network Diffusion
Paper • 2402.13144 • Published • 99 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper • 2402.13251 • Published • 15
-
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Paper • 2306.17107 • Published • 11 -
On the Hidden Mystery of OCR in Large Multimodal Models
Paper • 2305.07895 • Published • 1 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 53
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
RMT: Retentive Networks Meet Vision Transformers
Paper • 2309.11523 • Published • 33 -
Guiding a Diffusion Model with a Bad Version of Itself
Paper • 2406.02507 • Published • 17 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper • 2405.20204 • Published • 37