File size: 6,031 Bytes
9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 9e5ede4 72ce17a 487d8fc 72ce17a 89148be 72ce17a 89148be 72ce17a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
license: mit
title: VoiceAPI
tags:
- tts
- text-to-speech
- indian-languages
- vits
- multilingual
- speech-synthesis
language:
- hi
- bn
- mr
- te
- kn
- en
- bho
- mai
- mag
- hne
- gu
---
# 🎙️ VoiceAPI - Multi-lingual Indian Language TTS
An advanced **multi-speaker, multilingual text-to-speech (TTS) synthesizer** supporting 11 Indian languages with 21 voice options.
## 🌟 Features
- **11 Indian Languages**: Hindi, Bengali, Marathi, Telugu, Kannada, Gujarati, Bhojpuri, Chhattisgarhi, Maithili, Magahi, English
- **21 Voice Options**: Male and female voices for each language
- **High-Quality Audio**: 22050 Hz sample rate, natural prosody
- **REST API**: Simple GET/POST endpoints for easy integration
- **Real-time Synthesis**: Fast inference on CPU/GPU
## 🗣️ Supported Languages
| Language | Code | Female | Male | Script |
|----------|------|--------|------|--------|
| Hindi | hi | ✅ | ✅ | देवनागरी |
| Bengali | bn | ✅ | ✅ | বাংলা |
| Marathi | mr | ✅ | ✅ | देवनागरी |
| Telugu | te | ✅ | ✅ | తెలుగు |
| Kannada | kn | ✅ | ✅ | ಕನ್ನಡ |
| Gujarati | gu | ✅ (MMS) | - | ગુજરાતી |
| Bhojpuri | bho | ✅ | ✅ | देवनागरी |
| Chhattisgarhi | hne | ✅ | ✅ | देवनागरी |
| Maithili | mai | ✅ | ✅ | देवनागरी |
| Magahi | mag | ✅ | ✅ | देवनागरी |
| English | en | ✅ | ✅ | Latin |
## 📡 API Usage
### Endpoint
\`\`\`
[https://harshil748-voiceapi.hf.space/](https://harshil748-voiceapi.hf.space/)
\`\`\`
### Parameters
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| \`text\` | string | Yes | Text to synthesize (lowercase for English) |
| \`lang\` | string | Yes | Language name (hindi, bengali, etc.) |
| \`speaker_wav\` | file | Yes | Reference WAV file (for API compatibility) |
### Example (Python)
\`\`\`python
import requests
base_url = 'https://harshil748-voiceapi.hf.space/Get_Inference'
WavPath = 'reference.wav'
params = {
'text': 'नमस्ते, आप कैसे हैं?',
'lang': 'hindi',
}
with open(WavPath, "rb") as AudioFile:
response = requests.get(base_url, params=params, files={'speaker_wav': AudioFile.read()})
if response.status_code == 200:
with open('output.wav', 'wb') as f:
f.write(response.content)
print("Audio saved as 'output.wav'")
\`\`\`
### Example (cURL)
\`\`\`bash
curl -X POST "https://harshil748-voiceapi.hf.space/Get_Inference?text=hello&lang=english" \\
-F "speaker[email protected]" \\
-o output.wav
\`\`\`
## 🏗️ Model Architecture
- **Base Model**: VITS (Variational Inference with adversarial learning for Text-to-Speech)
- **Encoder**: Transformer-based text encoder (6 layers, 192 hidden channels)
- **Decoder**: HiFi-GAN neural vocoder
- **Duration Predictor**: Stochastic duration predictor for natural prosody
- **Sample Rate**: 22050 Hz (16000 Hz for Gujarati MMS)
## 📊 Training
### Datasets Used
| Dataset | Languages | Source | License |
|---------|-----------|--------|---------|
| OpenSLR-103 | Hindi | [OpenSLR](https://www.openslr.org/103/) | CC BY 4.0 |
| OpenSLR-37 | Bengali | [OpenSLR](https://www.openslr.org/37/) | CC BY 4.0 |
| OpenSLR-64 | Marathi | [OpenSLR](https://www.openslr.org/64/) | CC BY 4.0 |
| OpenSLR-66 | Telugu | [OpenSLR](https://www.openslr.org/66/) | CC BY 4.0 |
| OpenSLR-79 | Kannada | [OpenSLR](https://www.openslr.org/79/) | CC BY 4.0 |
| OpenSLR-78 | Gujarati | [OpenSLR](https://www.openslr.org/78/) | CC BY 4.0 |
| Common Voice | Hindi, Bengali | [Mozilla](https://commonvoice.mozilla.org/) | CC0 |
| IndicTTS | Multiple | [IIT Madras](https://www.iitm.ac.in/donlab/tts/) | Research |
| Indic-Voices | Multiple | [AI4Bharat](https://ai4bharat.iitm.ac.in/indic-voices/) | CC BY 4.0 |
### Training Configuration
- **Epochs**: 1000
- **Batch Size**: 32
- **Learning Rate**: 2e-4
- **Optimizer**: AdamW
- **FP16 Training**: Enabled
- **Hardware**: NVIDIA V100/A100 GPUs
See \`training/\` directory for full training scripts and configurations.
## 🚀 Deployment
This API is deployed on HuggingFace Spaces using Docker:
\`\`\`dockerfile
FROM python:3.10-slim
# ... installs dependencies
# Downloads models from Harshil748/VoiceAPI-Models
# Runs FastAPI server on port 7860
\`\`\`
Models are hosted separately at [Harshil748/VoiceAPI-Models](https://huggingface.co/Harshil748/VoiceAPI-Models) (~8GB).
## 📁 Project Structure
\`\`\`
VoiceAPI/
├── app.py # HuggingFace Spaces entry point
├── Dockerfile # Docker configuration
├── requirements.txt # Python dependencies
├── download_models.py # Model downloader
├── src/
│ ├── api.py # FastAPI REST server
│ ├── engine.py # TTS inference engine
│ ├── config.py # Voice configurations
│ └── tokenizer.py # Text tokenization
└── training/
├── train_vits.py # VITS training script
├── prepare_dataset.py # Data preparation
├── export_model.py # Model export
├── datasets.csv # Dataset links
└── configs/ # Training configs
\`\`\`
## 📜 License
- **Code**: MIT License
- **Models**: CC BY 4.0 (following SYSPIN licensing)
- **Datasets**: Individual licenses (see training/datasets.csv)
## 🙏 Acknowledgments
- [SYSPIN IISc SPIRE Lab](https://syspin.iisc.ac.in/) for pre-trained VITS models
- [Facebook MMS](https://github.com/facebookresearch/fairseq/tree/main/examples/mms) for Gujarati TTS
- [Coqui TTS](https://github.com/coqui-ai/TTS) for the TTS library
- [AI4Bharat](https://ai4bharat.iitm.ac.in/) for Indian language resources
## 📧 Contact
Built for the **Voice Tech for All** Hackathon - Multi-lingual TTS for healthcare assistants serving low-income communities.
|