nielsr HF Staff commited on
Commit
42c5ffd
·
verified ·
1 Parent(s): 97521ec

Improve model card: add library_name, pipeline_tag and link to paper

Browse files

Hi! I'm Niels, part of the community science team at Hugging Face.

This PR improves the model card for `MOSS-VoiceGenerator` by:
- Adding `library_name: transformers` to the metadata, which enables the automated "Use in Transformers" button/code snippet.
- Ensuring the `pipeline_tag: text-to-speech` is correctly set.
- Updating the Arxiv badge to link directly to the research paper: [MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models](https://huggingface.co/papers/2602.10934).
- Adding a BibTeX citation for the paper.

Please feel free to merge if this looks good!

Files changed (1) hide show
  1. README.md +22 -3
README.md CHANGED
@@ -1,8 +1,11 @@
1
  ---
2
  license: apache-2.0
 
 
3
  tags:
4
  - text-to-speech
5
  ---
 
6
  # MOSS-TTS Family
7
 
8
  <br>
@@ -18,7 +21,7 @@ tags:
18
  <a href="https://github.com/OpenMOSS/MOSS-TTS/tree/main"><img src="https://img.shields.io/badge/Project%20Page-GitHub-blue"></a>
19
  <a href="https://modelscope.cn/collections/OpenMOSS-Team/MOSS-TTS"><img src="https://img.shields.io/badge/ModelScope-Models-lightgrey?logo=modelscope&amp"></a>
20
  <a href="https://mosi.cn/#models"><img src="https://img.shields.io/badge/Blog-View-blue?logo=internet-explorer&amp"></a>
21
- <a href="https://github.com/OpenMOSS/MOSS-TTS"><img src="https://img.shields.io/badge/Arxiv-Coming%20soon-red?logo=arxiv&amp"></a>
22
 
23
  <a href="https://studio.mosi.cn"><img src="https://img.shields.io/badge/AIStudio-Try-green?logo=internet-explorer&amp"></a>
24
  <a href="https://studio.mosi.cn/docs/moss-tts"><img src="https://img.shields.io/badge/API-Docs-00A3FF?logo=fastapi&amp"></a>
@@ -27,7 +30,7 @@ tags:
27
  </div>
28
 
29
  ## Overview
30
- MOSS‑TTS Family is an open‑source **speech and sound generation model family** from [MOSI.AI](https://mosi.cn/#hero) and the [OpenMOSS team](https://www.open-moss.com/). It is designed for **high‑fidelity**, **high‑expressiveness**, and **complex real‑world scenarios**, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.
31
 
32
 
33
  ## Introduction
@@ -185,7 +188,7 @@ text1="哎呀,我的老腰啊,这年纪大了就是不行了。"
185
  instruction1="疲惫沙哑的老年声音缓慢抱怨,带有轻微呻吟。"
186
 
187
  text2="亲爱的观众们,今天我要为大家做一道传说中的龙须面,这道面条细如发丝,需要极其精湛的手艺才能制作成功,请大家仔细观看我的每一个动作。"
188
- instruction2="热情的美食节目主持人,语调生动活泼,充满对美食的热爱和专业精神。"
189
 
190
  text3="Hey there, stranger! What brings you to our humble town? Looking for a good drink or a tall tale?"
191
  instruction3="Hearty, jovial tavern owner's voice, loud and welcoming with a slightly gruff, friendly tone in American English, radiating warmth and hospitality."
@@ -264,3 +267,19 @@ MOSS Voice Generator demonstrates significant advantages in subjective evaluatio
264
  <p align="center">
265
  <img src="https://speech-demo.oss-cn-shanghai.aliyuncs.com/moss_tts_demo/tts_readme_imgaes_demo/moss_voiceGenerator_winrate" width="85%" />
266
  </p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-to-speech
4
+ library_name: transformers
5
  tags:
6
  - text-to-speech
7
  ---
8
+
9
  # MOSS-TTS Family
10
 
11
  <br>
 
21
  <a href="https://github.com/OpenMOSS/MOSS-TTS/tree/main"><img src="https://img.shields.io/badge/Project%20Page-GitHub-blue"></a>
22
  <a href="https://modelscope.cn/collections/OpenMOSS-Team/MOSS-TTS"><img src="https://img.shields.io/badge/ModelScope-Models-lightgrey?logo=modelscope&amp"></a>
23
  <a href="https://mosi.cn/#models"><img src="https://img.shields.io/badge/Blog-View-blue?logo=internet-explorer&amp"></a>
24
+ <a href="https://huggingface.co/papers/2602.10934"><img src="https://img.shields.io/badge/Arxiv-2602.10934-red?logo=arxiv&amp"></a>
25
 
26
  <a href="https://studio.mosi.cn"><img src="https://img.shields.io/badge/AIStudio-Try-green?logo=internet-explorer&amp"></a>
27
  <a href="https://studio.mosi.cn/docs/moss-tts"><img src="https://img.shields.io/badge/API-Docs-00A3FF?logo=fastapi&amp"></a>
 
30
  </div>
31
 
32
  ## Overview
33
+ MOSS‑TTS Family is an open‑source **speech and sound generation model family** from [MOSI.AI](https://mosi.cn/#hero) and the [OpenMOSS team](https://www.open-moss.com/). It is designed for **high‑fidelity**, **high‑expressiveness**, and **complex real‑world scenarios**, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS. It leverages the technology presented in the paper [MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models](https://huggingface.co/papers/2602.10934).
34
 
35
 
36
  ## Introduction
 
188
  instruction1="疲惫沙哑的老年声音缓慢抱怨,带有轻微呻吟。"
189
 
190
  text2="亲爱的观众们,今天我要为大家做一道传说中的龙须面,这道面条细如发丝,需要极其精湛的手艺才能制作成功,请大家仔细观看我的每一个动作。"
191
+ instruction2="热情的美食节目主持人,语调生动活泼,充满对美食的热爱 and 专业精神。"
192
 
193
  text3="Hey there, stranger! What brings you to our humble town? Looking for a good drink or a tall tale?"
194
  instruction3="Hearty, jovial tavern owner's voice, loud and welcoming with a slightly gruff, friendly tone in American English, radiating warmth and hospitality."
 
267
  <p align="center">
268
  <img src="https://speech-demo.oss-cn-shanghai.aliyuncs.com/moss_tts_demo/tts_readme_imgaes_demo/moss_voiceGenerator_winrate" width="85%" />
269
  </p>
270
+
271
+ ## Citation
272
+
273
+ If you use this model or the CAT architecture in your research, please cite:
274
+
275
+ ```bibtex
276
+ @misc{gong2026mossaudiotokenizerscalingaudiotokenizers,
277
+ title={MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models},
278
+ author={Yitian Gong and Kuangwei Chen and Zhaoye Fei and Xiaogui Yang and Ke Chen and Yang Wang and Kexin Huang and Mingshu Chen and Ruixiao Li and Qingyuan Cheng and Shimin Li and Xipeng Qiu},
279
+ year={2026},
280
+ eprint={2602.10934},
281
+ archivePrefix={arXiv},
282
+ primaryClass={cs.SD},
283
+ url={https://arxiv.org/abs/2602.10934},
284
+ }
285
+ ```