Thanks! Just pushed the repo public: github.com/yuriyvnv/TTS-Augmented-ASR
This is the codebase behind a paper I wrote on Estonian and Slovenian, so you'll find the full pipeline there: not just the Parakeet fine-tuning scripts, but also the synthetic data generation (LLM text diversification + OpenAI TTS synthesis) that powers the augmentation. Everything was trained on a single NVIDIA H100.
One thing worth knowing for African languages:
Parakeet v3 is only pretrained on 25 languages, so you'd be doing cross-lingual transfer from scratch. The base won't recognize the language zero-shot, but fine-tuning still works โ just expect a much rougher starting point than what you saw in my models.
Always evaluate zero-shot first. I had one language (Polish) where fine-tuning actually made things worse due to domain mismatch, or the learning rate was too low (still analyzing why this happened).
Standard recipe worked across everything I tried: AdamW, lr=5e-5, cosine annealing, 10% warmup, bf16, batch 32-64, early stopping on val_wer. The larger the batch size, especially for parakeet models, the better the gradient flow during training, since the model is compact.
Happy to help if you hit anything weird.