--- title: TextSyncMimi Speech Editing emoji: 🎙️ colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: cc-by-4.0 --- # TextSyncMimi Speech Editing Demo Interactive demo for **TextSyncMimi**, a text-synchronous neural audio codec that enables token-level speech editing. ## What This Demo Does 1. **Generate Speech**: Use OpenAI TTS to create two audio samples with different voices and speaking styles 2. **Token-Level Analysis**: See how text is tokenized (LLaMA-3 tokenizer) 3. **Speech Embedding Swapping**: Swap speech characteristics at specific token positions 4. **Real-time Editing**: Hear the results instantly ## How to Use ### Step 1: Configure Voices - Enter your text transcript - Select two different OpenAI TTS voices (e.g., "alloy" and "echo") - (Optional) Add style instructions like "speak slowly" or "sound excited" ### Step 2: Generate Audio - Click "Generate & Process" to create both audio samples - The model will show you the tokenization and generate a baseline reconstruction ### Step 3: Swap Embeddings - Enter token indices to swap (e.g., "0,2,5") - Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions ## Examples ### Example 1: Word-Level Swapping **Text**: "Hello, how are you today?" - Token 0-1: "Hello" (swap these) - Result: First word has Voice 2's style, rest has Voice 1's style ### Example 2: Prosody Transfer **Voice 1**: "speak slowly and calmly" **Voice 2**: "speak quickly with excitement" **Swap indices**: Middle of sentence **Result**: Sentence starts calm, becomes excited mid-way ## For Users Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing. ## For Developers (Running Your Own Copy) Want to run your own version? Here's how: 1. **Duplicate this Space** or create a new one 2. Copy the files (`app.py`, `requirements.txt`, `README.md`) 3. **Add your OpenAI API key as a Secret**: - Go to Space Settings → Repository secrets - Click "New secret" - Name: `OPENAI_API_KEY` - Value: Your OpenAI API key - Click "Add secret" 4. The Space will automatically restart with your key (securely stored, never exposed) ## Technical Details - **Model**: TextSyncMimi-v1 (loaded from [HuggingFace Hub](https://huggingface.co/potsawee/TextSyncMimi-v1)) - **Tokenizer**: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace) - **Text Embeddings**: Embeddings built into the model (4096-dim) - **Audio Codec**: Mimi (24kHz, 12.5 fps) - **TTS Provider**: OpenAI (gpt-4o-mini-tts with instructions, or tts-1) - **Security**: API keys stored securely in Space secrets ## Links - 🤗 [Model Card](https://huggingface.co/potsawee/TextSyncMimi-v1)