Whisper-Large-V3-French
Property | Value |
---|---|
Parameter Count | 1.61B |
License | MIT |
Paper | Whisper Paper |
Model Type | Speech Recognition (ASR) |
What is whisper-large-v3-french?
Whisper-Large-V3-French is a specialized speech recognition model fine-tuned from OpenAI's Whisper Large V3 architecture, specifically optimized for French language processing. The model demonstrates exceptional performance across various French speech recognition tasks, with Word Error Rates (WER) ranging from 3.98% to 8.91% on different benchmark datasets.
Implementation Details
The model was trained on over 2,500 hours of French speech data, incorporating multiple datasets including Common Voice, Multilingual LibriSpeech, and VoxPopuli. It features advanced capabilities for predicting casing, punctuation, and numbers in transcriptions.
- Supports multiple implementation frameworks including Hugging Face Transformers, OpenAI Whisper, and Faster Whisper
- Includes speculative decoding support for 2x faster inference
- Compatible with various deployment options including CPU and GPU implementations
Core Capabilities
- High-accuracy French speech transcription with WER as low as 3.98% on MLS dataset
- Robust performance on both short-form and long-form audio
- Handles various French accents including African French variants
- Supports parallel processing for long audio files
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized optimization for French language processing, achieving state-of-the-art performance while maintaining the ability to handle casing, punctuation, and numerical transcription. It's been extensively tested on both in-distribution and out-of-distribution datasets, proving its robustness across different use cases.
Q: What are the recommended use cases?
The model is ideal for French speech transcription tasks including call center conversations, academic lectures, media content, and general speech recognition applications. It performs particularly well in both short-form (< 30 seconds) and long-form transcription scenarios.