Uniform INT8 Quantized DeepSeek-OCR

This model is a uniformly quantized version of deepseek-ai/DeepSeek-OCR.

Quantization Details

Method: Uniform INT8 quantization
Quantized Layers: 2342
Vision Layers: 96 @ 8-bit
Language Layers: 2197 @ 8-bit
Average Bit-width: 8.00
Original Size: 6363.12 MB
Compressed Size: 3351.56 MB
Compression Ratio: 1.90x

Model Files

quantized_weights.pt: Quantized model weights
quantization_info.json: Layer-wise quantization configuration
layer_configs.json: Detailed layer configurations
compression_stats.json: Compression statistics
layer_analysis.json: Modality analysis (vision/language/other)

Usage

import torch
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)

# Load quantized weights
state_dict = torch.load("quantized_weights.pt")
# Note: You'll need the QuantizedLinear class to properly load and use this model

Baseline Characteristics

This uniform quantization approach:

Applies the same 8-bit quantization to ALL layers
Does not distinguish between vision and language modalities
Serves as a baseline for comparison with modality-aware methods

Citation

If you use this model, please cite the original model and mention the uniform quantization approach.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SamMikaelson/deepseek-ocr-int8-quantized

Base model

deepseek-ai/DeepSeek-OCR

Finetuned

(114)

this model