whisper-small-cantonese

Maintained By
alvanlii

Whisper Small Cantonese

PropertyValue
Parameter Count242M
LicenseApache 2.0
PaperResearch Paper
Model TypeAutomatic Speech Recognition
CER Score7.93% (without punctuation)

What is whisper-small-cantonese?

Whisper-small-cantonese is a specialized speech recognition model fine-tuned from OpenAI's Whisper-small architecture specifically for Cantonese language processing. This model represents a significant advancement in Cantonese ASR, trained on over 934 hours of diverse data including Common Voice, CantoMap, and YouTube content.

Implementation Details

The model utilizes a transformer-based architecture with several optimizations for performance. It supports both standard and Flash Attention implementations, with the latter reducing inference time from 0.308s to 0.055s per sample on GPU.

  • GPU VRAM Usage: ~1.5GB
  • Supports speculative decoding for faster processing
  • Compatible with Whisper.cpp and WhisperX/FasterWhisper via CT2

Core Capabilities

  • Fast inference with Flash Attention support
  • Excellent accuracy with 7.93% CER (without punctuation)
  • Efficient processing of long-form audio
  • Flexible deployment options (CPU/GPU)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized optimization for Cantonese, extensive training data including pseudo-labeled content, and excellent balance of speed and accuracy. It achieves state-of-the-art performance while maintaining reasonable resource requirements.

Q: What are the recommended use cases?

The model is ideal for Cantonese speech transcription tasks, particularly in applications requiring real-time or near-real-time processing. It's suitable for both production environments and research applications, especially when dealing with varied Cantonese dialects and accents.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.