Whisper Small Cantonese
Property | Value |
---|---|
Parameter Count | 242M |
License | Apache 2.0 |
Paper | Research Paper |
Model Type | Automatic Speech Recognition |
CER Score | 7.93% (without punctuation) |
What is whisper-small-cantonese?
Whisper-small-cantonese is a specialized speech recognition model fine-tuned from OpenAI's Whisper-small architecture specifically for Cantonese language processing. This model represents a significant advancement in Cantonese ASR, trained on over 934 hours of diverse data including Common Voice, CantoMap, and YouTube content.
Implementation Details
The model utilizes a transformer-based architecture with several optimizations for performance. It supports both standard and Flash Attention implementations, with the latter reducing inference time from 0.308s to 0.055s per sample on GPU.
- GPU VRAM Usage: ~1.5GB
- Supports speculative decoding for faster processing
- Compatible with Whisper.cpp and WhisperX/FasterWhisper via CT2
Core Capabilities
- Fast inference with Flash Attention support
- Excellent accuracy with 7.93% CER (without punctuation)
- Efficient processing of long-form audio
- Flexible deployment options (CPU/GPU)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for Cantonese, extensive training data including pseudo-labeled content, and excellent balance of speed and accuracy. It achieves state-of-the-art performance while maintaining reasonable resource requirements.
Q: What are the recommended use cases?
The model is ideal for Cantonese speech transcription tasks, particularly in applications requiring real-time or near-real-time processing. It's suitable for both production environments and research applications, especially when dealing with varied Cantonese dialects and accents.