---
title: TextSyncMimi Speech Editing
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: cc-by-4.0
---

# TextSyncMimi Speech Editing Demo

Interactive demo for **TextSyncMimi**, a text-synchronous neural audio codec that enables token-level speech editing.

## What This Demo Does

1. **Generate Speech**: Use OpenAI TTS to create two audio samples with different voices and speaking styles
2. **Token-Level Analysis**: See how text is tokenized (LLaMA-3 tokenizer)
3. **Speech Embedding Swapping**: Swap speech characteristics at specific token positions
4. **Real-time Editing**: Hear the results instantly

## How to Use

### Step 1: Configure Voices
- Enter your text transcript
- Select two different OpenAI TTS voices (e.g., "alloy" and "echo")
- (Optional) Add style instructions like "speak slowly" or "sound excited"

### Step 2: Generate Audio
- Click "Generate & Process" to create both audio samples
- The model will show you the tokenization and generate a baseline reconstruction

### Step 3: Swap Embeddings
- Enter token indices to swap (e.g., "0,2,5")
- Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions

## Examples

### Example 1: Word-Level Swapping
**Text**: "Hello, how are you today?"
- Token 0-1: "Hello" (swap these)
- Result: First word has Voice 2's style, rest has Voice 1's style

### Example 2: Prosody Transfer
**Voice 1**: "speak slowly and calmly"
**Voice 2**: "speak quickly with excitement"
**Swap indices**: Middle of sentence
**Result**: Sentence starts calm, becomes excited mid-way

## For Users

Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing.

## For Developers (Running Your Own Copy)

Want to run your own version? Here's how:

1. **Duplicate this Space** or create a new one
2. Copy the files (`app.py`, `requirements.txt`, `README.md`)
3. **Add your OpenAI API key as a Secret**:
   - Go to Space Settings → Repository secrets
   - Click "New secret"
   - Name: `OPENAI_API_KEY`
   - Value: Your OpenAI API key
   - Click "Add secret"
4. The Space will automatically restart with your key (securely stored, never exposed)

## Technical Details

- **Model**: TextSyncMimi-v1 (loaded from [HuggingFace Hub](https://huggingface.co/potsawee/TextSyncMimi-v1))
- **Tokenizer**: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace)
- **Text Embeddings**: Embeddings built into the model (4096-dim)
- **Audio Codec**: Mimi (24kHz, 12.5 fps)
- **TTS Provider**: OpenAI (gpt-4o-mini-tts with instructions, or tts-1)
- **Security**: API keys stored securely in Space secrets

## Links

- 🤗 [Model Card](https://huggingface.co/potsawee/TextSyncMimi-v1)