🧾 Model Card: `nevernever69/dit-doclaynet-segmentation`

🧠 Model Overview

This model is a fine-tuned version of microsoft/dit-base for document layout semantic segmentation on the DocLayNet dataset (small subset: nevernever69/small-DocLayNet-v1.1). It segments scanned document images into 11 layout categories such as title, paragraph, table, and footer.

📚 Intended Uses

Segment document images into structured layout elements
Assist in downstream tasks like document OCR, archiving, and automatic annotation
Useful for researchers and developers working in document AI or digital humanities

🏷️ Labels (11 Classes)

ID	Label	Color
0	Background	Black
1	Title	Red
2	Paragraph	Green
3	Figure	Blue
4	Table	Yellow
5	List	Magenta
6	Header	Cyan
7	Footer	Dark Red
8	Page Number	Dark Green
9	Footnote	Dark Blue
10	Caption	Olive

🧪 Training Details

Base model: microsoft/dit-base
Dataset: nevernever69/small-DocLayNet-v1.1
Input size: 1025×1025 (resized to 56×56 masks during training)
Batch size: 8
Epochs: 2
Learning rate: 5e-5
Loss function: Cross-entropy
Hardware: Trained with mixed precision (fp16) on GPU

📊 Evaluation

The model shows promising results on a validation subset, capturing distinct document elements with clear boundaries. Overlay visualizations confirm precise semantic segmentation of dense and sparse regions in historical and modern documents.

🚀 How to Use

from transformers import AutoImageProcessor, BeitForSemanticSegmentation
from PIL import Image
import torch

# Load model
model = BeitForSemanticSegmentation.from_pretrained("nevernever69/dit-doclaynet-segmentation")
image_processor = AutoImageProcessor.from_pretrained("nevernever69/dit-doclaynet-segmentation")

# Load and preprocess image
image = Image.open("your-image.png").convert("RGB")
inputs = image_processor(images=image, return_tensors="pt").to("cuda")

# Inference
model.to("cuda").eval()
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    upsampled = torch.nn.functional.interpolate(logits, size=image.size[::-1], mode="bilinear", align_corners=False)
    mask = upsampled.argmax(dim=1).squeeze().cpu().numpy()

🧑‍🎓 Author

Created by Never @nevernever69.
Feel free to open issues or discuss improvements on the Hugging Face hub.

📝 Citation

If you use this model in your work, please consider citing:

@misc{never2025doclaynetseg,
  author = {Never},
  title = {Document Layout Segmentation using DiT-base fine-tuned on DocLayNet},
  year = {2025},
  howpublished = {\url{https://huggingface.co/nevernever69/dit-doclaynet-segmentation}}
}

Downloads last month: 233

Safetensors

Model size

0.2B params

Tensor type

F32

nevernever69
/

dit-doclaynet-segmentation

🧾 Model Card: `nevernever69/dit-doclaynet-segmentation`

🧠 Model Overview

📚 Intended Uses

🏷️ Labels (11 Classes)

🧪 Training Details

📊 Evaluation

🚀 How to Use

🧑‍🎓 Author

📝 Citation

Dataset used to train nevernever69/dit-doclaynet-segmentation

🧾 Model Card: nevernever69/dit-doclaynet-segmentation

🧠 Model Overview

📚 Intended Uses

🏷️ Labels (11 Classes)

🧪 Training Details

📊 Evaluation

🚀 How to Use

🧑‍🎓 Author

📝 Citation

Dataset used to train nevernever69/dit-doclaynet-segmentation

🧾 Model Card: `nevernever69/dit-doclaynet-segmentation`