whisper-large-zh-cv11

Maintained By
jonatasgrosman

Whisper Large Chinese (Mandarin)

PropertyValue
LicenseApache 2.0
Base Modelopenai/whisper-large-v2
Training DataCommon Voice 11
Primary TaskAutomatic Speech Recognition

What is whisper-large-zh-cv11?

Whisper-large-zh-cv11 is a specialized speech recognition model fine-tuned from OpenAI's Whisper Large v2 specifically for Mandarin Chinese. Developed by Jonatas Grosman, it demonstrates significant improvements over the base model, achieving a Character Error Rate (CER) of 9.55% on the Common Voice 11 test set, compared to 29.90% for the original model.

Implementation Details

The model was trained using both training and validation splits from Common Voice 11, with 1,000 samples reserved for evaluation during fine-tuning. It implements the Transformer architecture and runs on PyTorch, offering seamless integration with the Hugging Face transformers library.

  • Supports both raw and normalized text transcription
  • Handles casing and punctuation
  • Optimized for Mandarin Chinese speech recognition

Core Capabilities

  • Achieves 9.55% CER and 55.02% WER on Common Voice 11
  • Performs well on out-of-domain data (11.76% CER on Fleurs dataset)
  • Supports specialized handling of numerical transcriptions
  • Includes language and task-specific decoder prompts

Frequently Asked Questions

Q: What makes this model unique?

This model significantly outperforms the base Whisper Large v2 on Mandarin Chinese, reducing CER by over 20 percentage points on Common Voice 11. It's specifically optimized for Chinese speech recognition while maintaining the ability to handle different text normalization scenarios.

Q: What are the recommended use cases?

The model is ideal for Mandarin Chinese speech transcription tasks, particularly when high character-level accuracy is required. It's suitable for both general transcription and scenarios requiring normalized text output, though users should be aware of potential limitations with numerical value transcriptions.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.