Llama 3.2 3B SoftLabel
LoRA adapter fine-tuned with KL divergence against soft probability distributions from a Bayesian Graded Response Model (GRM) teacher for Big Five personality prediction.
Unlike standard cross-entropy (hard labels), KL divergence training preserves the teacher's uncertainty, producing calibrated probability estimates over 5-point Likert responses.
Training
| Base model | Meta Llama 3.2 3B Instruct |
| Loss | KL Divergence (batchmean) |
| Precision | bf16 |
| Infrastructure | University cluster (SLURM) โ 2x NVIDIA RTX A6000 48GB |
Data
- 11,250 train / 1,250 valid / 3,125 test episodes
- Each episode: multi-turn IPIP-50 personality questionnaire with soft label targets over responses 1โ5
Hyperparameters
| LoRA r / alpha / dropout | 16 / 16 / 0.05 |
| Target modules | q, k, v, o, gate, up, down proj |
| Learning rate | 2e-4 (cosine schedule, 100 warmup steps) |
| Effective batch size | 16 (4 per-GPU x 2 GPUs x 4 grad accum) |
| Max epochs | 3 (early stopping, patience=5) |
| Optimizer | AdamW fused (weight decay 0.01) |
| Max sequence length | 4096 |
Results
| Best eval loss (KL div) | โ |
| Final train loss | 0.0015 |
| Best checkpoint | Step 500 |
| Test accuracy | 51.07% |
| Teacher ceiling | 51.28% |
- Downloads last month
- 54
Model tree for DavidL123/Llama-3.2-3B-SoftLabel
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
unsloth/Llama-3.2-3B-Instruct