π₯ BEiT Fine-Tuned on Food Image Classification
(Food101 + UECFood256 + FruitVeg81)
π Model Summary
This model is a fine-tuned BEiT image encoder for food image classification, trained on a merged dataset combining three major datasets:
- Food101
- UECFood256
- FruitVeg81
It predicts 409 food classes, including fruits, vegetables, pastries, soups, desert etc.
1. Model Details
π§ Model Description
- Model type: Image classification (BEiT backbone)
- Base model:
microsoft/beit-base-patch16-384
- Fine-tuned by: Jean Petit BIKIM (TSOTSA Team)
- Task: Food Image Classification
- Framework: π€ Transformers + PyTorch
- License: MIT (or specify one)
- Language: Vision-only
π Model Sources
- Repository: (add link)
- Paper (optional): _In preparation _
- Demo (optional): Add Gradio link here
2. Uses
βοΈ Intended Use
- Food image recognition
- Nutrition information analysis
- Food engineering & dietetics
- Dataset enrichment
- Preprocessing for multimodal VLMs (Qwen2.5-VL, CLIP-based models)
β Out-of-Scope
- Medical nutrition diagnosis
- Real-time embedded recognition
- Images far outside food domain
3. Bias, Risks & Limitations
β οΈ Limitations
- Confusion between visually similar meals
β οΈ Risks
- Misclassification may propagate errors in downstream nutrition estimation.
βοΈ Recommendations
Use alongside:
- segmentation
- object detection
- multimodal reasoning models
4. How to Use the Model
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
processor = AutoImageProcessor.from_pretrained("your-username/your-model")
model = AutoModelForImageClassification.from_pretrained("your-username/your-model")
img = Image.open("example.jpg")
inputs = processor(img, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
pred = outputs.logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[pred])