π DiT-Base Document Classifier (Fine-Tuned for Financial Documents)
π§ Model Description
This model is a fine-tuned version of microsoft/dit-base (Document Image Transformer) for image-based document classification. It is specifically optimized to identify financial document types:
bank_checksinvoicesreceipts
The model is trained on real-world scanned images and is suitable for enterprise automation tasks such as accounts payable automation, banking workflows, and OCR pre-classification.
π Model Overview
| Feature | Description |
|---|---|
| Base Model | microsoft/dit-base |
| Task | Single-label image classification |
| Number of Classes | 3 |
| Input Format | RGB Image |
| Output | Class label with logits/probabilities |
| Use Case | Financial document routing and automation |
π Performance Metrics (Validation Set)
| Metric | Score |
|---|---|
| Accuracy | 0.94+ |
| F1 (Weighted) | 0.94+ |
| ROC-AUC (Macro) | 0.98+ |
| PR-AUC (Macro) | 0.97+ |
| Mean Brier Score | ~0.03 |
These metrics demonstrate strong discriminative ability and excellent probability calibration.
π Calibration & Reliability
- Low Brier Scores (~0.03) indicate well-calibrated probabilities
- Reliability curves closely follow the diagonal, showing trustable confidence scores
- Confidence histograms reveal a high proportion of high-confidence correct predictions
π Dataset
- Total images (after cleaning & deduplication): ~14,155
- Dataset is balanced across all classes
- Images are single-page scanned documents
| Class | Image Count |
|---|---|
| bank_checks | ~5079 |
| invoices | ~5848 |
| receipts | ~3228 |
βοΈ Training Configuration
| Setting | Value |
|---|---|
| Optimizer | AdamW |
| Learning Rate | 5e-6 |
| Batch Size | 8 |
| Epochs | 10 |
| Mixed Precision | FP16 |
| PyTorch Compile | Enabled |
| Data Augmentation | Resize, Shift/Scale/Rotate, Blur, Brightness/Contrast, Noise |
π€ Intended Use Cases
β Recommended:
- Financial document routing
- Pre-OCR classification
- Automated invoice/check/receipt workflows
- Embedded in RPA/ERP systems
β οΈ Use with caution:
- Regulatory or compliance-sensitive decisions
- Documents outside the financial domain
β Not intended for:
- Legal document interpretation
- Multilingual handwritten recognition (unless fine-tuned further)
π¦ How to Use
from transformers import AutoModelForImageClassification, AutoImageProcessor
from PIL import Image
import torch
# Load model and processor
model = AutoModelForImageClassification.from_pretrained("victormurcia/DiT-DocumentClassifier")
processor = AutoImageProcessor.from_pretrained("victormurcia/DiT-DocumentClassifier")
# Load image
img = Image.open("your_image.jpg").convert("RGB")
# Preprocess
inputs = processor(images=img, return_tensors="pt")
# Inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(-1).item()
print("Predicted document type:", model.config.id2label[predicted_class])
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for victormurcia/DiT-DocumentClassifier-v1
Base model
microsoft/dit-base