wav2vec2-xls-r-300m-ftspeech
Property | Value |
---|---|
Parameter Count | 315M |
Model Type | Speech Recognition |
Base Architecture | wav2vec2-xls-r-300m |
License | Danish Parliament License |
Language | Danish |
What is wav2vec2-xls-r-300m-ftspeech?
This is a specialized Danish speech recognition model fine-tuned on the FTSpeech dataset, containing 1,800 hours of transcribed speeches from the Danish parliament. Built upon Facebook's wav2vec2-xls-r-300m architecture, it represents a significant advancement in Danish language speech recognition technology.
Implementation Details
The model leverages the powerful XLS-R architecture with 315M parameters, optimized for F32 tensor operations. It demonstrates impressive performance metrics, achieving a Word Error Rate (WER) of 17.91% on Danish Common Voice 8.0 and 13.84% on the Alvenir test set when using a 5-gram language model.
- Fine-tuned on high-quality parliamentary speech data
- Supports both standalone and language model-enhanced inference
- Optimized for Danish language processing
- Implements the Transformers architecture
Core Capabilities
- Automatic speech recognition for Danish language
- High accuracy transcription of formal speech
- Compatible with PyTorch framework
- Supports real-time inference endpoints
Frequently Asked Questions
Q: What makes this model unique?
This model's uniqueness lies in its specialized training on Danish parliamentary speeches, making it particularly effective for formal Danish speech recognition. The combination of the robust XLS-R architecture and extensive training data results in state-of-the-art performance for Danish ASR.
Q: What are the recommended use cases?
The model is ideal for transcribing Danish speech in formal contexts, particularly parliamentary or official speeches. It can be used in both academic and professional settings where accurate Danish language transcription is required.