potsawee's picture
Upload README.md with huggingface_hub
c3d23ec verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: TextSyncMimi Speech Editing
emoji: ๐ŸŽ™๏ธ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: cc-by-4.0

TextSyncMimi Speech Editing Demo

Interactive demo for TextSyncMimi, a text-synchronous neural audio codec that enables token-level speech editing.

What This Demo Does

  1. Generate Speech: Use OpenAI TTS to create two audio samples with different voices and speaking styles
  2. Token-Level Analysis: See how text is tokenized (LLaMA-3 tokenizer)
  3. Speech Embedding Swapping: Swap speech characteristics at specific token positions
  4. Real-time Editing: Hear the results instantly

How to Use

Step 1: Configure Voices

  • Enter your text transcript
  • Select two different OpenAI TTS voices (e.g., "alloy" and "echo")
  • (Optional) Add style instructions like "speak slowly" or "sound excited"

Step 2: Generate Audio

  • Click "Generate & Process" to create both audio samples
  • The model will show you the tokenization and generate a baseline reconstruction

Step 3: Swap Embeddings

  • Enter token indices to swap (e.g., "0,2,5")
  • Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions

Examples

Example 1: Word-Level Swapping

Text: "Hello, how are you today?"

  • Token 0-1: "Hello" (swap these)
  • Result: First word has Voice 2's style, rest has Voice 1's style

Example 2: Prosody Transfer

Voice 1: "speak slowly and calmly" Voice 2: "speak quickly with excitement" Swap indices: Middle of sentence Result: Sentence starts calm, becomes excited mid-way

For Users

Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing.

For Developers (Running Your Own Copy)

Want to run your own version? Here's how:

  1. Duplicate this Space or create a new one
  2. Copy the files (app.py, requirements.txt, README.md)
  3. Add your OpenAI API key as a Secret:
    • Go to Space Settings โ†’ Repository secrets
    • Click "New secret"
    • Name: OPENAI_API_KEY
    • Value: Your OpenAI API key
    • Click "Add secret"
  4. The Space will automatically restart with your key (securely stored, never exposed)

Technical Details

  • Model: TextSyncMimi-v1 (loaded from HuggingFace Hub)
  • Tokenizer: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace)
  • Text Embeddings: Embeddings built into the model (4096-dim)
  • Audio Codec: Mimi (24kHz, 12.5 fps)
  • TTS Provider: OpenAI (gpt-4o-mini-tts with instructions, or tts-1)
  • Security: API keys stored securely in Space secrets

Links