Instructions to use minemaster01/qwen25-vl-3b-floorplan-sft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use minemaster01/qwen25-vl-3b-floorplan-sft with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct") model = PeftModel.from_pretrained(base_model, "minemaster01/qwen25-vl-3b-floorplan-sft") - Notebooks
- Google Colab
- Kaggle
Qwen2.5-VL floor plan SFT adapter (stage 1)
Hub: mudasir13cs/qwen25-vl-3b-floorplan-sft
Improved using Qwen — LoRA supervised fine-tuning (SFT) on CubiCasa5K (CC BY‑NC 4.0), built on Qwen2.5-VL-3B-Instruct (LICENSE).
Intended non-commercial / research use, consistent with the Qwen research license and CubiCasa NC terms.
Stage 2 (optional): GRPO fine-tune — mudasir13cs/qwen25-vl-3b-floorplan-grpo. In a local training checkout, see floorplan-vlm-grpo/README.md for the same style of usage doc.
Paper & upstream training material
- Method: FloorplanVLM (arXiv:2602.06507)
- Original Hub collection: manitocross/floorplan-vlm-training — readme, Zenodo→CubiCasa5K download, SVG→JSON labels, GRPO overview.
- SFT recipe (canonical source on Hub):
train_floorplan_vlm.py—MODEL_ID,HUB_MODEL_ID,OUTPUT_DIR,DATA_DIR, epochs, LR, LoRA layout,SYSTEM_PROMPT/USER_PROMPT(use these strings at inference).
If you are working inside a clone of your training repo, the same script may exist locally as train_floorplan_vlm.py.
Quick install
Inference / loading only:
pip install torch torchvision transformers peft accelerate pillow
Full SFT training (matches script docstring; includes data prep deps):
pip install torch torchvision transformers trl peft datasets accelerate shapely Pillow lxml numpy tqdm huggingface_hub
# optional GPU attention: pip install flash-attn
Loading the adapter
Use Hub IDs (recommended):
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
BASE = "Qwen/Qwen2.5-VL-3B-Instruct"
ADAPTER = "mudasir13cs/qwen25-vl-3b-floorplan-sft"
processor = AutoProcessor.from_pretrained(BASE)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
BASE, torch_dtype="auto", device_map="auto"
)
model = PeftModel.from_pretrained(model, ADAPTER)
model.eval()
If you cloned weights into this folder locally: set ADAPTER = "./floorplan-vlm-sft" (or an absolute path) instead of the Hub repo id.
Using the model (inference)
This stage defines the recommended prompts: the system message embeds the output JSON schema. Keep SYSTEM_PROMPT and USER_PROMPT aligned with train_floorplan_vlm.py constants.
User text: “Vectorize this floor plan into structured JSON with all walls, doors, windows, and rooms.”
End-to-end pattern (matches the inference test at the end of train_floorplan_vlm.py):
import json, re, torch
from PIL import Image
SYSTEM_PROMPT = (
"You are a floor plan vectorization expert. Extract wall, door, window geometry "
"from floor plan images into structured JSON.\n\n"
"Output ONLY valid JSON with this schema:\n"
'{"walls":[{"id":"wall_N","start":[x,y],"end":[x,y],"thickness":T,"curvature":0,'
'"openings":[{"type":"door"|"window","center":D,"width":W}]}],'
'"rooms":[{"label":"room_type","walls":["wall_N",...]}]}\n\n'
"Coordinates normalized so longer image edge = 1024."
)
USER_PROMPT = "Vectorize this floor plan into structured JSON with all walls, doors, windows, and rooms."
image = Image.open("plan.png").convert("RGB")
messages = [
{"role": "system", "content": [{"type": "text", "text": SYSTEM_PROMPT}]},
{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": USER_PROMPT}]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True)
inputs = {k: v.to(model.device) if hasattr(v, "to") else v for k, v in inputs.items()}
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
raw = processor.batch_decode(out[:, inputs.input_ids.shape[1] :], skip_special_tokens=True)[0]
m = re.search(r"\{[\s\S]*\}", raw)
plan = json.loads(m.group()) if m else None
Output shape: top-level walls (with optional openings) and rooms. Example JSON is documented under Output JSON Schema in the Manitocross training README.
Reproducing stage 1
huggingface-cli loginif you push to Hub (PUSH_TO_HUBin script).- Run
train_floorplan_vlm.py: first run downloads CubiCasa5K from Zenodo into./cubicasa_data(~5GB). TuneNUM_EPOCHS,MAX_SAMPLES,LEARNING_RATE,HUB_MODEL_ID, etc. in the configuration block at the top of that file.
Training details
- Base model:
Qwen/Qwen2.5-VL-3B-Instruct - Dataset: CubiCasa5K · Zenodo
- Method: LoRA SFT (
SFTTrainer/ TRL; see script) - Supervision targets: SVG-derived structured JSON (
walls,rooms, openings)
Add your own run metadata: epochs, LR, seed, GPU, date, partial MAX_SAMPLES if applicable.
Citation
@article{floorplanvlm2026,
title={FloorplanVLM: A Vision-Language Model for Floorplan Vectorization},
journal={arXiv preprint arXiv:2602.06507},
year={2026}
}
Acknowledgments
- FloorplanVLM (arXiv:2602.06507)
- CubiCasa5K (arXiv:1904.01920)
- Qwen2.5-VL-3B-Instruct
- Upstream training reference: manitocross/floorplan-vlm-training
- Stage 2 (optional):
mudasir13cs/qwen25-vl-3b-floorplan-grpo
Author / contact
Mudasir — multimodal AI, VLM fine-tuning, retrieval/RAG research, and engineering; MS AI Convergence, 숭실대학교 — Soongsil University, Seoul. More credentials, publications, and projects: mudasir13cs.github.io
- Hugging Face: @mudasir13cs
- GitHub: @mudasir13cs
- Email: mudasir13cs@gmail.com
- Downloads last month
- 18
Model tree for minemaster01/qwen25-vl-3b-floorplan-sft
Base model
Qwen/Qwen2.5-VL-3B-Instruct