Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
Visualize image depth, segmentation, and generation