Romanian Wav2Vec2
Property | Value |
---|---|
Parameter Count | 315M |
License | Apache 2.0 |
Base Model | facebook/wav2vec2-xls-r-300m |
Best WER | 7.31% (with LM) |
What is romanian-wav2vec2?
Romanian-wav2vec2 is a state-of-the-art speech recognition model specifically designed for the Romanian language. Built on Facebook's wav2vec2-xls-r-300m architecture, it achieved first place in HuggingFace's Robust Speech Challenge for Romanian speech recognition. The model combines advanced acoustic modeling with a 5-gram language model to deliver superior transcription accuracy.
Implementation Details
The model is implemented using the wav2vec2 architecture with a CTC head for speech recognition. It processes 16kHz audio input and includes a specialized 5-gram language model trained on Romanian parliamentary corpora. Key technical specifications include support for PyTorch, Transformers, and Safetensors.
- Trained on Common Voice 8.0 and Romanian Speech Synthesis datasets
- Includes both acoustic model and language model optimization
- Achieves 7.31% WER and 2.17% CER with language model
- Supports 16kHz audio input processing
Core Capabilities
- High-accuracy Romanian speech transcription
- Language model-boosted decoding
- Lowercased text output without punctuation
- Real-time inference support
Frequently Asked Questions
Q: What makes this model unique?
This model achieved the top position in HuggingFace's Robust Speech Challenge for Romanian, combining acoustic modeling with a 5-gram language model for superior accuracy. Its performance is particularly notable with a 7.31% WER on the Common Voice test set.
Q: What are the recommended use cases?
The model is ideal for Romanian speech recognition tasks requiring 16kHz audio input. It's particularly suited for applications needing high-accuracy transcription, such as voice assistants, transcription services, and voice-enabled applications focusing on Romanian language content.