potsawee commited on
Commit
c3d23ec
·
verified ·
1 Parent(s): e150831

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +76 -7
README.md CHANGED
@@ -1,13 +1,82 @@
1
  ---
2
- title: TextSyncMimi SpeechEditing
3
- emoji:
4
- colorFrom: yellow
5
- colorTo: gray
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
  pinned: false
10
- short_description: TextSyncMimi for Speech Editing
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: TextSyncMimi Speech Editing
3
+ emoji: 🎙️
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
+ license: cc-by-4.0
11
  ---
12
 
13
+ # TextSyncMimi Speech Editing Demo
14
+
15
+ Interactive demo for **TextSyncMimi**, a text-synchronous neural audio codec that enables token-level speech editing.
16
+
17
+ ## What This Demo Does
18
+
19
+ 1. **Generate Speech**: Use OpenAI TTS to create two audio samples with different voices and speaking styles
20
+ 2. **Token-Level Analysis**: See how text is tokenized (LLaMA-3 tokenizer)
21
+ 3. **Speech Embedding Swapping**: Swap speech characteristics at specific token positions
22
+ 4. **Real-time Editing**: Hear the results instantly
23
+
24
+ ## How to Use
25
+
26
+ ### Step 1: Configure Voices
27
+ - Enter your text transcript
28
+ - Select two different OpenAI TTS voices (e.g., "alloy" and "echo")
29
+ - (Optional) Add style instructions like "speak slowly" or "sound excited"
30
+
31
+ ### Step 2: Generate Audio
32
+ - Click "Generate & Process" to create both audio samples
33
+ - The model will show you the tokenization and generate a baseline reconstruction
34
+
35
+ ### Step 3: Swap Embeddings
36
+ - Enter token indices to swap (e.g., "0,2,5")
37
+ - Click "Perform Swap" to hear Voice 1 with Voice 2's characteristics at those positions
38
+
39
+ ## Examples
40
+
41
+ ### Example 1: Word-Level Swapping
42
+ **Text**: "Hello, how are you today?"
43
+ - Token 0-1: "Hello" (swap these)
44
+ - Result: First word has Voice 2's style, rest has Voice 1's style
45
+
46
+ ### Example 2: Prosody Transfer
47
+ **Voice 1**: "speak slowly and calmly"
48
+ **Voice 2**: "speak quickly with excitement"
49
+ **Swap indices**: Middle of sentence
50
+ **Result**: Sentence starts calm, becomes excited mid-way
51
+
52
+ ## For Users
53
+
54
+ Just try the demo! The OpenAI API key is already configured. Enter text, select voices, and experiment with speech editing.
55
+
56
+ ## For Developers (Running Your Own Copy)
57
+
58
+ Want to run your own version? Here's how:
59
+
60
+ 1. **Duplicate this Space** or create a new one
61
+ 2. Copy the files (`app.py`, `requirements.txt`, `README.md`)
62
+ 3. **Add your OpenAI API key as a Secret**:
63
+ - Go to Space Settings → Repository secrets
64
+ - Click "New secret"
65
+ - Name: `OPENAI_API_KEY`
66
+ - Value: Your OpenAI API key
67
+ - Click "Add secret"
68
+ 4. The Space will automatically restart with your key (securely stored, never exposed)
69
+
70
+ ## Technical Details
71
+
72
+ - **Model**: TextSyncMimi-v1 (loaded from [HuggingFace Hub](https://huggingface.co/potsawee/TextSyncMimi-v1))
73
+ - **Tokenizer**: LLaMA-3.1 (128K vocabulary, loaded from HuggingFace)
74
+ - **Text Embeddings**: Embeddings built into the model (4096-dim)
75
+ - **Audio Codec**: Mimi (24kHz, 12.5 fps)
76
+ - **TTS Provider**: OpenAI (gpt-4o-mini-tts with instructions, or tts-1)
77
+ - **Security**: API keys stored securely in Space secrets
78
+
79
+ ## Links
80
+
81
+ - 🤗 [Model Card](https://huggingface.co/potsawee/TextSyncMimi-v1)
82
+