πŸ“„ DiT-Base Document Classifier (Fine-Tuned for Financial Documents)

🧠 Model Description

This model is a fine-tuned version of microsoft/dit-base (Document Image Transformer) for image-based document classification. It is specifically optimized to identify financial document types:

  • bank_checks
  • invoices
  • receipts

The model is trained on real-world scanned images and is suitable for enterprise automation tasks such as accounts payable automation, banking workflows, and OCR pre-classification.


πŸ“Œ Model Overview

Feature Description
Base Model microsoft/dit-base
Task Single-label image classification
Number of Classes 3
Input Format RGB Image
Output Class label with logits/probabilities
Use Case Financial document routing and automation

πŸ“Š Performance Metrics (Validation Set)

Metric Score
Accuracy 0.94+
F1 (Weighted) 0.94+
ROC-AUC (Macro) 0.98+
PR-AUC (Macro) 0.97+
Mean Brier Score ~0.03

These metrics demonstrate strong discriminative ability and excellent probability calibration.


πŸ“ˆ Calibration & Reliability

  • Low Brier Scores (~0.03) indicate well-calibrated probabilities
  • Reliability curves closely follow the diagonal, showing trustable confidence scores
  • Confidence histograms reveal a high proportion of high-confidence correct predictions

πŸ“‚ Dataset

  • Total images (after cleaning & deduplication): ~14,155
  • Dataset is balanced across all classes
  • Images are single-page scanned documents
Class Image Count
bank_checks ~5079
invoices ~5848
receipts ~3228

βš™οΈ Training Configuration

Setting Value
Optimizer AdamW
Learning Rate 5e-6
Batch Size 8
Epochs 10
Mixed Precision FP16
PyTorch Compile Enabled
Data Augmentation Resize, Shift/Scale/Rotate, Blur, Brightness/Contrast, Noise

πŸ€– Intended Use Cases

βœ… Recommended:

  • Financial document routing
  • Pre-OCR classification
  • Automated invoice/check/receipt workflows
  • Embedded in RPA/ERP systems

⚠️ Use with caution:

  • Regulatory or compliance-sensitive decisions
  • Documents outside the financial domain

❌ Not intended for:

  • Legal document interpretation
  • Multilingual handwritten recognition (unless fine-tuned further)

πŸ“¦ How to Use

from transformers import AutoModelForImageClassification, AutoImageProcessor
from PIL import Image
import torch

# Load model and processor
model = AutoModelForImageClassification.from_pretrained("victormurcia/DiT-DocumentClassifier")
processor = AutoImageProcessor.from_pretrained("victormurcia/DiT-DocumentClassifier")

# Load image
img = Image.open("your_image.jpg").convert("RGB")

# Preprocess
inputs = processor(images=img, return_tensors="pt")

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = logits.argmax(-1).item()

print("Predicted document type:", model.config.id2label[predicted_class])
Downloads last month
4
Safetensors
Model size
85.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for victormurcia/DiT-DocumentClassifier-v1

Base model

microsoft/dit-base
Finetuned
(7)
this model