wav2vec2-xls-r-300m-german-de
Property | Value |
---|---|
Base Model | facebook/wav2vec2-xls-r-300m |
Training Dataset | Mozilla Common Voice 7.0 (German) |
Word Error Rate | 20.16% |
Model Hub | Hugging Face |
What is wav2vec2-xls-r-300m-german-de?
This is a fine-tuned German speech recognition model based on Facebook's wav2vec2-xls-r-300m architecture. It's specifically optimized for German language ASR tasks, achieving a Word Error Rate (WER) of 20.16% on the evaluation set.
Implementation Details
The model was trained using a careful optimization strategy with the following key parameters: learning rate of 7.5e-05, batch size of 32, and linear learning rate scheduling with 2000 warmup steps. The training process ran for 3.4 epochs using mixed precision training with Native AMP.
- Uses Adam optimizer with betas=(0.9,0.999)
- Implements gradient accumulation steps of 4
- Achieves final validation loss of 0.1768
Core Capabilities
- German speech recognition with competitive WER
- Efficient processing with 300M parameters
- Optimized for real-world applications
- Supports variable-length audio inputs
Frequently Asked Questions
Q: What makes this model unique?
This model combines the powerful wav2vec2-xls-r-300m architecture with specific optimization for German language, achieving a strong balance between performance and resource efficiency.
Q: What are the recommended use cases?
The model is ideal for German speech recognition tasks, transcription services, and voice-based applications requiring German language support. It's particularly suitable for scenarios where a WER of around 20% is acceptable.