Llama 3.2 3B SoftLabel

LoRA adapter fine-tuned with KL divergence against soft probability distributions from a Bayesian Graded Response Model (GRM) teacher for Big Five personality prediction.

Unlike standard cross-entropy (hard labels), KL divergence training preserves the teacher's uncertainty, producing calibrated probability estimates over 5-point Likert responses.

Training

Base model Meta Llama 3.2 3B Instruct
Loss KL Divergence (batchmean)
Precision bf16
Infrastructure University cluster (SLURM) โ€” 2x NVIDIA RTX A6000 48GB

Data

  • 11,250 train / 1,250 valid / 3,125 test episodes
  • Each episode: multi-turn IPIP-50 personality questionnaire with soft label targets over responses 1โ€“5

Hyperparameters

LoRA r / alpha / dropout 16 / 16 / 0.05
Target modules q, k, v, o, gate, up, down proj
Learning rate 2e-4 (cosine schedule, 100 warmup steps)
Effective batch size 16 (4 per-GPU x 2 GPUs x 4 grad accum)
Max epochs 3 (early stopping, patience=5)
Optimizer AdamW fused (weight decay 0.01)
Max sequence length 4096

Results

Best eval loss (KL div) โ€”
Final train loss 0.0015
Best checkpoint Step 500
Test accuracy 51.07%
Teacher ceiling 51.28%
Downloads last month
54
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DavidL123/Llama-3.2-3B-SoftLabel

Adapter
(405)
this model