Spaces:

potsawee
/

TextSyncMimi-SpeechEditing

Paused

App Files Files Community

TextSyncMimi-SpeechEditing / README.md

potsawee

Upload README.md with huggingface_hub

c3d23ec verified 2 months ago

preview code

raw

history blame contribute delete

2.78 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: TextSyncMimi Speech Editing
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: cc-by-4.0

TextSyncMimi Speech Editing Demo

Interactive demo for TextSyncMimi, a text-synchronous neural audio codec that enables token-level speech editing.

What This Demo Does

Generate Speech: Use OpenAI TTS to create two audio samples with different voices and speaking styles
Token-Level Analysis: See how text is tokenized (LLaMA-3 tokenizer)
Speech Embedding Swapping: Swap speech characteristics at specific token positions
Real-time Editing: Hear the results instantly

How to Use

Step 1: Configure Voices

Enter your text transcript
Select two different OpenAI TTS voices (e.g., "alloy" and "echo")
(Optional) Add style instructions like "speak slowly" or "sound excited"

Step 2: Generate Audio

Click "Generate & Process" to create both audio samples
The model will show you the tokenization and generate a baseline reconstruction

Step 3: Swap Embeddings

Enter token indices to swap (e.g., "0,2,5")
Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions

Examples

Example 1: Word-Level Swapping

Text: "Hello, how are you today?"

Token 0-1: "Hello" (swap these)
Result: First word has Voice 2's style, rest has Voice 1's style

Example 2: Prosody Transfer

Voice 1: "speak slowly and calmly" Voice 2: "speak quickly with excitement" Swap indices: Middle of sentence Result: Sentence starts calm, becomes excited mid-way

For Users

Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing.

For Developers (Running Your Own Copy)

Want to run your own version? Here's how:

Duplicate this Space or create a new one
Copy the files (app.py, requirements.txt, README.md)
Add your OpenAI API key as a Secret:
- Go to Space Settings → Repository secrets
- Click "New secret"
- Name: OPENAI_API_KEY
- Value: Your OpenAI API key
- Click "Add secret"
The Space will automatically restart with your key (securely stored, never exposed)

Technical Details

Model: TextSyncMimi-v1 (loaded from HuggingFace Hub)
Tokenizer: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace)
Text Embeddings: Embeddings built into the model (4096-dim)
Audio Codec: Mimi (24kHz, 12.5 fps)
TTS Provider: OpenAI (gpt-4o-mini-tts with instructions, or tts-1)
Security: API keys stored securely in Space secrets