wav2vec2-xls-r-300m-sk-cv8
Property | Value |
---|---|
Base Model | facebook/wav2vec2-xls-r-300m |
Task | Speech Recognition (Slovak) |
Performance | WER: 49.57%, CER: 13.33% |
Author | comodoro |
Model Link | Hugging Face |
What is wav2vec2-xls-r-300m-sk-cv8?
This is a specialized speech recognition model fine-tuned for the Slovak language, based on Facebook's wav2vec2-xls-r-300m architecture. The model has been specifically trained on the Common Voice 8.0 dataset to provide accurate speech-to-text capabilities for Slovak language applications.
Implementation Details
The model implements a CTC-based speech recognition approach using the wav2vec2 architecture. It operates on 16kHz audio input and has been trained using advanced optimization techniques including native AMP mixed precision training.
- Learning Rate: 7e-4 with linear scheduler and 500 warmup steps
- Batch Size: 32 (640 total with gradient accumulation)
- Training Duration: 50 epochs
- Optimizer: Adam (β1=0.9, β2=0.999, ε=1e-08)
Core Capabilities
- Direct speech-to-text transcription without requiring a language model
- Handles 16kHz audio input (with included resampling capability)
- Batch processing support with attention masking
- Optimized for Slovak language recognition
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Slovak language speech recognition, leveraging the powerful wav2vec2-xls-r-300m architecture while being fine-tuned on the Common Voice 8.0 dataset. Its direct use without requiring a language model makes it particularly practical for Slovak speech recognition tasks.
Q: What are the recommended use cases?
The model is ideal for Slovak speech transcription tasks, particularly in applications requiring real-time or batch processing of audio content. It's suitable for applications like voice assistants, transcription services, and audio content analysis tools focused on Slovak language content.