Chrisfrancisque
/

llama2-7b-coding-fft

full-fine-tuning

mask-fine-tuning

Model card Files Files and versions

llama2-7b-coding-fft

This model is a Full Fine-Tuned (FFT) version of LLaMA2-7B on coding datasets, trained as part of replicating the Mask Fine-Tuning (MFT) paper.

Model Details

Base Model: meta-llama/Llama-2-7b-hf
Training Type: Full Fine-Tuning (FFT)
Domain: Coding
Hardware: TPU v4-8
Training Framework: PyTorch + torch_xla

Training Data

The model was trained on 30,000 samples from three coding datasets (matching the paper):

Tulu 3 Persona Python: 10,000 samples
Evol CodeAlpaca: 10,000 samples
Code-Alpaca: 10,000 samples

Training Configuration

Epochs: 2
Sequence Length: 4096
Learning Rate: 2e-5
Batch Size: 8 (effective)
Optimizer: AdamW
LR Scheduler: Linear with warmup
Mixed Precision: bfloat16

Training Results

Final Loss: 0.15353151041666666
Final Perplexity: 1.1673020833333334
Training Time: ~7 hours on TPU v4-8
Total Steps: 7500

Loss Progression

Epoch 0: 0.42591484375
Epoch 1: 0.15353151041666666

Intended Use

This model serves as the FFT baseline for the Mask Fine-Tuning paper replication. It will be evaluated on:

HumanEval (code generation benchmark)
Target: Match paper's FFT baseline of 29.3%

Evaluation

Evaluation on HumanEval is pending. Results will be updated here once available.

Citation

If you use this model, please cite the original MFT paper:

@article{mft2025,
  title={Mask Fine-Tuning},
  author={[Authors from paper]},
  journal={arXiv preprint arXiv:2503.22764v1},
  year={2025}
}

Reproducibility

Training configuration and code available at: GitHub Repository

License

This model inherits the LLaMA 2 Community License from the base model.

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chrisfrancisque/llama2-7b-coding-fft

Base model

meta-llama/Llama-2-7b-hf

Finetuned

(1092)

this model

Paper for Chrisfrancisque/llama2-7b-coding-fft

Boosting Large Language Models with Mask Fine-Tuning

Paper • 2503.22764 • Published Mar 27, 2025 • 1