Llama 3.2 3B SoftLabel

LoRA adapter fine-tuned with KL divergence against soft probability distributions from a Bayesian Graded Response Model (GRM) teacher for Big Five personality prediction.

Unlike standard cross-entropy (hard labels), KL divergence training preserves the teacher's uncertainty, producing calibrated probability estimates over 5-point Likert responses.

Training


Base model	Meta Llama 3.2 3B Instruct
Loss	KL Divergence (batchmean)
Precision	bf16
Infrastructure	University cluster (SLURM) — 2x NVIDIA RTX A6000 48GB

Data

11,250 train / 1,250 valid / 3,125 test episodes
Each episode: multi-turn IPIP-50 personality questionnaire with soft label targets over responses 1–5

Hyperparameters


LoRA r / alpha / dropout	16 / 16 / 0.05
Target modules	q, k, v, o, gate, up, down proj
Learning rate	2e-4 (cosine schedule, 100 warmup steps)
Effective batch size	16 (4 per-GPU x 2 GPUs x 4 grad accum)
Max epochs	3 (early stopping, patience=5)
Optimizer	AdamW fused (weight decay 0.01)
Max sequence length	4096

Results


Best eval loss (KL div)	—
Final train loss	0.0015
Best checkpoint	Step 500
Test accuracy	51.07%
Teacher ceiling	51.28%

Downloads last month: 54

Model tree for DavidL123/Llama-3.2-3B-SoftLabel

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

unsloth/Llama-3.2-3B-Instruct

Adapter

(405)

this model