MNV-17: Nonverbal Vocalization Recognition

This repository demonstrates the excellent performance of Qwen2.5-Omni and Qwen2-Audio models fine-tuned on the MNV-17 dataset for Nonverbal Vocalization (NV) ASR recognition tasks. It also provides inference scripts for Qwen2.5-Omni and Qwen2-Audio.

Click here for interactive audio demo

Key Findings

Unseen Speaker Generalization

Crucial Note: All demo samples are from speakers who were completely unseen during training.

This demonstrates that the model learned universal NV vocalization patterns rather than merely fitting specific speakers' habits, showcasing excellent cross-speaker generalization.

Model Performance

According to our paper experimental results:

Model Joint CER NV Recognition Accuracy
Qwen2.5-Omni 3.60% 57.29%
Qwen2-Audio 4.84% 56.28%
SenseVoice 8.71% 57.29%
Paraformer 5.70% 28.64%

Performance Highlights

  1. Lowest Joint Error Rate: Qwen2.5-Omni achieved 3.60% joint CER, best performance in dual ASR and NV recognition tasks.
  2. Excellent NV Recognition: 57.29% accuracy under strict exact-match evaluation (type, count, order must all match).

Dataset Characteristics

MNV-17 Dataset Advantages

  1. Performative Recording: Avoids ambiguity of NVs in spontaneous speech, ensures high-quality annotation.
  2. Class Balance: 17 NV categories with balanced distribution (max/min ratio only 2.7).
  3. Speaker Diversity: 49 native Mandarin speakers from various regions.
  4. Rich Context: NVs naturally embedded in semantically rich sentences.

Design Innovation

  • Scripted Approach: LLM-generated natural contexts ensure semantic reasonableness of NVs.
  • Multi-NV Combinations: Supports random combinations of 1-3 NVs, simulating real scenarios.
  • Speaker-Independent Split: Strict train/validation/test division ensures generalization evaluation.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kiiic/MNV-17-Qwen-fintune

Finetuned
(14)
this model

Dataset used to train kiiic/MNV-17-Qwen-fintune