📄 DiT-Base Document Classifier (Fine-Tuned for Financial Documents)

🧠 Model Description

This model is a fine-tuned version of microsoft/dit-base (Document Image Transformer) for image-based document classification. It is specifically optimized to identify financial document types:

bank_checks
invoices
receipts

The model is trained on real-world scanned images and is suitable for enterprise automation tasks such as accounts payable automation, banking workflows, and OCR pre-classification.

📌 Model Overview

Feature	Description
Base Model	microsoft/dit-base
Task	Single-label image classification
Number of Classes	3
Input Format	RGB Image
Output	Class label with logits/probabilities
Use Case	Financial document routing and automation

📊 Performance Metrics (Validation Set)

Metric	Score
Accuracy	0.94+
F1 (Weighted)	0.94+
ROC-AUC (Macro)	0.98+
PR-AUC (Macro)	0.97+
Mean Brier Score	~0.03

These metrics demonstrate strong discriminative ability and excellent probability calibration.

📈 Calibration & Reliability

Low Brier Scores (~0.03) indicate well-calibrated probabilities
Reliability curves closely follow the diagonal, showing trustable confidence scores
Confidence histograms reveal a high proportion of high-confidence correct predictions

📂 Dataset

Total images (after cleaning & deduplication): ~14,155
Dataset is balanced across all classes
Images are single-page scanned documents

Class	Image Count
bank_checks	~5079
invoices	~5848
receipts	~3228

⚙️ Training Configuration

Setting	Value
Optimizer	AdamW
Learning Rate	5e-6
Batch Size	8
Epochs	10
Mixed Precision	FP16
PyTorch Compile	Enabled
Data Augmentation	Resize, Shift/Scale/Rotate, Blur, Brightness/Contrast, Noise

🤖 Intended Use Cases

✅ Recommended:

Financial document routing
Pre-OCR classification
Automated invoice/check/receipt workflows
Embedded in RPA/ERP systems

⚠️ Use with caution:

Regulatory or compliance-sensitive decisions
Documents outside the financial domain

❌ Not intended for:

Legal document interpretation
Multilingual handwritten recognition (unless fine-tuned further)

📦 How to Use

from transformers import AutoModelForImageClassification, AutoImageProcessor
from PIL import Image
import torch

# Load model and processor
model = AutoModelForImageClassification.from_pretrained("victormurcia/DiT-DocumentClassifier")
processor = AutoImageProcessor.from_pretrained("victormurcia/DiT-DocumentClassifier")

# Load image
img = Image.open("your_image.jpg").convert("RGB")

# Preprocess
inputs = processor(images=img, return_tensors="pt")

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = logits.argmax(-1).item()

print("Predicted document type:", model.config.id2label[predicted_class])

Downloads last month: 4

Safetensors

Model size

85.8M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for victormurcia/DiT-DocumentClassifier-v1

Base model

microsoft/dit-base

Finetuned

(7)

this model