wav2vec2-large-xlsr-53-swedish
Property | Value |
---|---|
License | Apache 2.0 |
Author | KBLab |
Performance | WER: 14.3%, CER: 4.9% |
Downloads | 41,803 |
What is wav2vec2-large-xlsr-53-swedish?
This is a fine-tuned version of Facebook's wav2vec2-large-xlsr-53 model specifically optimized for Swedish speech recognition. The model has been trained on the NST Swedish Dictation dataset and Common Voice, making it highly effective for Swedish automatic speech recognition tasks.
Implementation Details
The model underwent a sophisticated training process, starting with 50 epochs of pre-training on 1000 hours of Swedish radio content, followed by fine-tuning on NST Swedish Dictation and Common Voice datasets. It requires 16kHz audio input for optimal performance.
- Built on wav2vec2-large-xlsr-53 architecture
- Supports PyTorch and Transformers implementation
- Includes custom processor for audio preprocessing
Core Capabilities
- High-accuracy Swedish speech recognition
- Efficient processing of 16kHz audio input
- Compatible with standard audio processing libraries
- Direct transcription without requiring a language model
Frequently Asked Questions
Q: What makes this model unique?
This model combines extensive pre-training on Swedish radio content with fine-tuning on high-quality dictation datasets, resulting in state-of-the-art performance for Swedish ASR. Its WER of 14.3% makes it highly reliable for practical applications.
Q: What are the recommended use cases?
The model is ideal for Swedish speech transcription tasks, particularly in applications requiring real-time processing or batch transcription of audio content. It's particularly well-suited for applications in media transcription, voice commands, and automated subtitling.