wav2vec2-xlsr-300m-finnish-lm
Property | Value |
---|---|
Parameter Count | 300 million |
Model Type | Speech Recognition (ASR) |
Architecture | Wav2Vec2 XLS-R |
Training Data | 275.6 hours of Finnish speech |
Best WER (with LM) | 8.16% |
What is wav2vec2-xlsr-300m-finnish-lm?
This is a fine-tuned version of Facebook's wav2vec2-xls-r-300m model specifically adapted for Finnish Automatic Speech Recognition (ASR). The model leverages the powerful wav2vec 2.0 architecture, which was pretrained on 436k hours of multilingual speech data. It includes a Finnish KenLM language model for improved transcription accuracy during the decoding phase.
Implementation Details
The model was trained using a combination of datasets, with the majority (82.73%) coming from the Aalto Finnish Parliament ASR Corpus. Training was conducted on a Tesla V100 GPU using 8-bit Adam optimizer with a linear learning rate scheduler. The model achieved its best performance after 10 epochs of training.
- Learning rate: 5e-04 with 500 warmup steps
- Batch size: 32 for both training and evaluation
- Mixed precision training with Native AMP
- Includes fine-tuned acoustic model and KenLM language model
Core Capabilities
- Transcription of Finnish speech to text
- Optimal performance on audio clips up to 20 seconds
- Strong performance on formal Finnish speech
- WER of 8.16% with language model, 17.92% without
- Supports real-time transcription with appropriate audio chunking
Frequently Asked Questions
Q: What makes this model unique?
This model combines a large-scale multilingual speech model with Finnish-specific training data and a custom language model, making it particularly effective for Finnish ASR tasks. It's especially strong in formal Finnish speech recognition, thanks to its training on parliamentary proceedings.
Q: What are the recommended use cases?
The model is best suited for transcribing formal Finnish speech, particularly in professional or official contexts. It works optimally with audio clips up to 20 seconds in length and performs best with clear, standard Finnish pronunciation rather than heavy dialects or informal speech.