Efficient Vision Encoding for Vision Language Models
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper β’ 2412.13303 β’ Published β’ 72 -
FastVLM WebGPU
π418Real-time video captioning powered by FastVLM
-
apple/FastVLM-0.5B
Text Generation β’ 0.8B β’ Updated β’ 10.1k β’ 354 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 2.06k β’ 71