deepseek-ai/DeepSeek-OCR is out! π₯ my take β€΅οΈ > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face π₯
> not only a document converter but also can do document question answering, understand multiple languages π€― > best part: released with Apache 2.0 license π use it with your commercial projects! > it supports transformers, vLLM and MLX from the get-go! π€ > built on SigLIP2 & granite-165M