Whisper Large Chinese (Mandarin)
Property | Value |
---|---|
License | Apache 2.0 |
Base Model | openai/whisper-large-v2 |
Training Data | Common Voice 11 |
Primary Task | Automatic Speech Recognition |
What is whisper-large-zh-cv11?
Whisper-large-zh-cv11 is a specialized speech recognition model fine-tuned from OpenAI's Whisper Large v2 specifically for Mandarin Chinese. Developed by Jonatas Grosman, it demonstrates significant improvements over the base model, achieving a Character Error Rate (CER) of 9.55% on the Common Voice 11 test set, compared to 29.90% for the original model.
Implementation Details
The model was trained using both training and validation splits from Common Voice 11, with 1,000 samples reserved for evaluation during fine-tuning. It implements the Transformer architecture and runs on PyTorch, offering seamless integration with the Hugging Face transformers library.
- Supports both raw and normalized text transcription
- Handles casing and punctuation
- Optimized for Mandarin Chinese speech recognition
Core Capabilities
- Achieves 9.55% CER and 55.02% WER on Common Voice 11
- Performs well on out-of-domain data (11.76% CER on Fleurs dataset)
- Supports specialized handling of numerical transcriptions
- Includes language and task-specific decoder prompts
Frequently Asked Questions
Q: What makes this model unique?
This model significantly outperforms the base Whisper Large v2 on Mandarin Chinese, reducing CER by over 20 percentage points on Common Voice 11. It's specifically optimized for Chinese speech recognition while maintaining the ability to handle different text normalization scenarios.
Q: What are the recommended use cases?
The model is ideal for Mandarin Chinese speech transcription tasks, particularly when high character-level accuracy is required. It's suitable for both general transcription and scenarios requiring normalized text output, though users should be aware of potential limitations with numerical value transcriptions.