Romanian Wav2Vec2

Property	Value
Parameter Count	315M
License	Apache 2.0
Base Model	facebook/wav2vec2-xls-r-300m
Best WER	7.31% (with LM)

What is romanian-wav2vec2?

Romanian-wav2vec2 is a state-of-the-art speech recognition model specifically designed for the Romanian language. Built on Facebook's wav2vec2-xls-r-300m architecture, it achieved first place in HuggingFace's Robust Speech Challenge for Romanian speech recognition. The model combines advanced acoustic modeling with a 5-gram language model to deliver superior transcription accuracy.

Implementation Details

The model is implemented using the wav2vec2 architecture with a CTC head for speech recognition. It processes 16kHz audio input and includes a specialized 5-gram language model trained on Romanian parliamentary corpora. Key technical specifications include support for PyTorch, Transformers, and Safetensors.

Trained on Common Voice 8.0 and Romanian Speech Synthesis datasets
Includes both acoustic model and language model optimization
Achieves 7.31% WER and 2.17% CER with language model
Supports 16kHz audio input processing

Core Capabilities

High-accuracy Romanian speech transcription
Language model-boosted decoding
Lowercased text output without punctuation
Real-time inference support

Frequently Asked Questions

Q: What makes this model unique?

This model achieved the top position in HuggingFace's Robust Speech Challenge for Romanian, combining acoustic modeling with a 5-gram language model for superior accuracy. Its performance is particularly notable with a 7.31% WER on the Common Voice test set.

Q: What are the recommended use cases?

The model is ideal for Romanian speech recognition tasks requiring 16kHz audio input. It's particularly suited for applications needing high-accuracy transcription, such as voice assistants, transcription services, and voice-enabled applications focusing on Romanian language content.