Document datasets with .pdf files that are usable with pixparse libraries and tools.
AI & ML interests
Document and User Interface Parsing, Understanding, Q&A.
Organization Card
Multi-modal document, image, and text datasets and models for document understanding, OCR, VQA tasks.
GitHub repos:
- Data Loading:
chug- https://github.com/huggingface/chug - Modelling:
pixparse- coming soon
models 0
None public yet
datasets 6
pixparse/pdfa-eng-wds
Viewer • Updated • 7.1k • 6.9k • 159
pixparse/idl-wds
Viewer • Updated • 3.41M • 2.45k • 194
pixparse/docvqa-wds
Updated • 80 • 4
pixparse/docvqa-single-page-questions
Viewer • Updated • 50k • 815 • 10
pixparse/cc12m-wds
Viewer • Updated • 11M • 32.2k • 42
pixparse/cc3m-wds
Viewer • Updated • 2.93M • 25.1k • 53