wav2vec2-xls-r-parlaspeech-hr
Property | Value |
---|---|
Base Model | facebook/wav2vec2-xls-r-300m |
Training Data | 300 hours of ParlaSpeech-HR v1.0 |
Test WER | 0.0761 |
Paper | ParlaSpeech-HR Paper |
What is wav2vec2-xls-r-parlaspeech-hr?
This is a Croatian Automatic Speech Recognition (ASR) model that builds upon the wav2vec2-xls-r-300m architecture. It has been specifically fine-tuned using 300 hours of Croatian parliamentary speech from the ParlaSpeech-HR v1.0 dataset, making it highly effective for Croatian speech recognition tasks.
Implementation Details
The model was trained with careful consideration of hyperparameters, including a batch size of 16, gradient accumulation steps of 4, and running for 8 epochs. The learning rate was set to 3e-4 with 500 warmup steps, optimizing the model's performance for Croatian speech recognition.
- Achieves 0.0234 Character Error Rate (CER) on test set
- Demonstrates 0.0761 Word Error Rate (WER) on test set
- Implements the Wav2Vec2ForCTC architecture for speech processing
Core Capabilities
- Croatian speech recognition with high accuracy
- Optimal performance on parliamentary speech
- Easy integration with the Transformers library
- Support for various audio input formats
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Croatian language ASR, trained on high-quality parliamentary speech data, making it particularly effective for formal Croatian speech recognition tasks.
Q: What are the recommended use cases?
The model is ideal for transcribing Croatian speech, particularly in formal contexts like parliamentary proceedings, official speeches, and professional environments where high accuracy is required.