nb-wav2vec2-1b-bokmaal
Property | Value |
---|---|
Parameter Count | 963M |
License | Apache 2.0 |
Paper | Research Paper |
WER Score | 6.33% (with KenLM) |
CER Score | 2.48% (with KenLM) |
What is nb-wav2vec2-1b-bokmaal?
nb-wav2vec2-1b-bokmaal is a state-of-the-art Norwegian speech recognition model developed by NbAiLab. Built upon Facebook's XLS-R architecture, this model represents a significant advancement in Norwegian ASR technology, particularly for the Bokmål variant. The model was developed during the Hugging Face Robust Speech Event and achieves impressive accuracy with a 6.33% Word Error Rate (WER).
Implementation Details
The model is fine-tuned on the Norwegian Parliamentary Speech Corpus (NPSC), utilizing the wav2vec2-xls-r-1b architecture as its foundation. It implements a 5-gram KenLM language model for improved accuracy and features comprehensive dropout strategies for robust performance.
- Base Architecture: wav2vec2-xls-r-1b
- Training Duration: 40 epochs
- Optimization: FP16 training with gradient checkpointing
- Dropout Configuration: Layerdrop (0.041), Attention (0.094), Activation (0.055)
Core Capabilities
- High-accuracy Bokmål speech recognition with 6.33% WER
- Efficient processing of audio inputs between 0.5 and 30 seconds
- Integrated language model support
- Robust performance on parliamentary speech data
Frequently Asked Questions
Q: What makes this model unique?
This model represents the current state-of-the-art in Norwegian Bokmål speech recognition, achieving significantly better results than previous solutions. Its integration with KenLM and careful optimization make it particularly effective for real-world applications.
Q: What are the recommended use cases?
The model is ideal for Norwegian Bokmål speech transcription tasks, particularly in formal contexts like parliamentary speeches, presentations, and professional audio content. It's optimized for audio segments between 0.5 and 30 seconds in length.