Uniform INT8 Quantized DeepSeek-OCR
This model is a uniformly quantized version of deepseek-ai/DeepSeek-OCR.
Quantization Details
- Method: Uniform INT8 quantization
- Quantized Layers: 2342
- Vision Layers: 96 @ 8-bit
- Language Layers: 2197 @ 8-bit
- Average Bit-width: 8.00
- Original Size: 6363.12 MB
- Compressed Size: 3351.56 MB
- Compression Ratio: 1.90x
Model Files
quantized_weights.pt: Quantized model weightsquantization_info.json: Layer-wise quantization configurationlayer_configs.json: Detailed layer configurationscompression_stats.json: Compression statisticslayer_analysis.json: Modality analysis (vision/language/other)
Usage
import torch
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("SamMikaelson/deepseek-ocr-int8-quantized", trust_remote_code=True)
# Load quantized weights
state_dict = torch.load("quantized_weights.pt")
# Note: You'll need the QuantizedLinear class to properly load and use this model
Baseline Characteristics
This uniform quantization approach:
- Applies the same 8-bit quantization to ALL layers
- Does not distinguish between vision and language modalities
- Serves as a baseline for comparison with modality-aware methods
Citation
If you use this model, please cite the original model and mention the uniform quantization approach.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for SamMikaelson/deepseek-ocr-int8-quantized
Base model
deepseek-ai/DeepSeek-OCR