wav2vec2-xls-r-300m-sk-cv8

Property	Value
Base Model	facebook/wav2vec2-xls-r-300m
Task	Speech Recognition (Slovak)
Performance	WER: 49.57%, CER: 13.33%
Author	comodoro
Model Link	Hugging Face

What is wav2vec2-xls-r-300m-sk-cv8?

This is a specialized speech recognition model fine-tuned for the Slovak language, based on Facebook's wav2vec2-xls-r-300m architecture. The model has been specifically trained on the Common Voice 8.0 dataset to provide accurate speech-to-text capabilities for Slovak language applications.

Implementation Details

The model implements a CTC-based speech recognition approach using the wav2vec2 architecture. It operates on 16kHz audio input and has been trained using advanced optimization techniques including native AMP mixed precision training.

Learning Rate: 7e-4 with linear scheduler and 500 warmup steps
Batch Size: 32 (640 total with gradient accumulation)
Training Duration: 50 epochs
Optimizer: Adam (β1=0.9, β2=0.999, ε=1e-08)

Core Capabilities

Direct speech-to-text transcription without requiring a language model
Handles 16kHz audio input (with included resampling capability)
Batch processing support with attention masking
Optimized for Slovak language recognition

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Slovak language speech recognition, leveraging the powerful wav2vec2-xls-r-300m architecture while being fine-tuned on the Common Voice 8.0 dataset. Its direct use without requiring a language model makes it particularly practical for Slovak speech recognition tasks.

Q: What are the recommended use cases?

The model is ideal for Slovak speech transcription tasks, particularly in applications requiring real-time or batch processing of audio content. It's suitable for applications like voice assistants, transcription services, and audio content analysis tools focused on Slovak language content.