Spaces:

potsawee
/

TextSyncMimi-SpeechEditing

Paused

App Files Files Community

potsawee commited on Oct 14

Commit

c3d23ec

verified ·

1 Parent(s): e150831

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +76 -7

README.md CHANGED Viewed

@@ -1,13 +1,82 @@
 ---
-title: TextSyncMimi SpeechEditing
-emoji: ⚡
-colorFrom: yellow
-colorTo: gray
 sdk: gradio
-sdk_version: 5.49.1
 app_file: app.py
 pinned: false
-short_description: TextSyncMimi for Speech Editing
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: TextSyncMimi Speech Editing
+emoji: 🎙️
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: cc-by-4.0
 ---
+# TextSyncMimi Speech Editing Demo
+Interactive demo for **TextSyncMimi**, a text-synchronous neural audio codec that enables token-level speech editing.
+## What This Demo Does
+1. **Generate Speech**: Use OpenAI TTS to create two audio samples with different voices and speaking styles
+2. **Token-Level Analysis**: See how text is tokenized (LLaMA-3 tokenizer)
+3. **Speech Embedding Swapping**: Swap speech characteristics at specific token positions
+4. **Real-time Editing**: Hear the results instantly
+## How to Use
+### Step 1: Configure Voices
+- Enter your text transcript
+- Select two different OpenAI TTS voices (e.g., "alloy" and "echo")
+- (Optional) Add style instructions like "speak slowly" or "sound excited"
+### Step 2: Generate Audio
+- Click "Generate & Process" to create both audio samples
+- The model will show you the tokenization and generate a baseline reconstruction
+### Step 3: Swap Embeddings
+- Enter token indices to swap (e.g., "0,2,5")
+- Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions
+## Examples
+### Example 1: Word-Level Swapping
+**Text**: "Hello, how are you today?"
+- Token 0-1: "Hello" (swap these)
+- Result: First word has Voice 2's style, rest has Voice 1's style
+### Example 2: Prosody Transfer
+**Voice 1**: "speak slowly and calmly"
+**Voice 2**: "speak quickly with excitement"
+**Swap indices**: Middle of sentence
+**Result**: Sentence starts calm, becomes excited mid-way
+## For Users
+Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing.
+## For Developers (Running Your Own Copy)
+Want to run your own version? Here's how:
+1. **Duplicate this Space** or create a new one
+2. Copy the files (`app.py`, `requirements.txt`, `README.md`)
+3. **Add your OpenAI API key as a Secret**:
+   - Go to Space Settings → Repository secrets
+   - Click "New secret"
+   - Name: `OPENAI_API_KEY`
+   - Value: Your OpenAI API key
+   - Click "Add secret"
+4. The Space will automatically restart with your key (securely stored, never exposed)
+## Technical Details
+- **Model**: TextSyncMimi-v1 (loaded from [HuggingFace Hub](https://huggingface.co/potsawee/TextSyncMimi-v1))
+- **Tokenizer**: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace)
+- **Text Embeddings**: Embeddings built into the model (4096-dim)
+- **Audio Codec**: Mimi (24kHz, 12.5 fps)
+- **TTS Provider**: OpenAI (gpt-4o-mini-tts with instructions, or tts-1)
+- **Security**: API keys stored securely in Space secrets
+## Links
+- 🤗 [Model Card](https://huggingface.co/potsawee/TextSyncMimi-v1)