nb-wav2vec2-1b-bokmaal

Property	Value
Parameter Count	963M
License	Apache 2.0
Paper	Research Paper
WER Score	6.33% (with KenLM)
CER Score	2.48% (with KenLM)

What is nb-wav2vec2-1b-bokmaal?

nb-wav2vec2-1b-bokmaal is a state-of-the-art Norwegian speech recognition model developed by NbAiLab. Built upon Facebook's XLS-R architecture, this model represents a significant advancement in Norwegian ASR technology, particularly for the Bokmål variant. The model was developed during the Hugging Face Robust Speech Event and achieves impressive accuracy with a 6.33% Word Error Rate (WER).

Implementation Details

The model is fine-tuned on the Norwegian Parliamentary Speech Corpus (NPSC), utilizing the wav2vec2-xls-r-1b architecture as its foundation. It implements a 5-gram KenLM language model for improved accuracy and features comprehensive dropout strategies for robust performance.

Base Architecture: wav2vec2-xls-r-1b
Training Duration: 40 epochs
Optimization: FP16 training with gradient checkpointing
Dropout Configuration: Layerdrop (0.041), Attention (0.094), Activation (0.055)

Core Capabilities

High-accuracy Bokmål speech recognition with 6.33% WER
Efficient processing of audio inputs between 0.5 and 30 seconds
Integrated language model support
Robust performance on parliamentary speech data

Frequently Asked Questions

Q: What makes this model unique?

This model represents the current state-of-the-art in Norwegian Bokmål speech recognition, achieving significantly better results than previous solutions. Its integration with KenLM and careful optimization make it particularly effective for real-world applications.

Q: What are the recommended use cases?

The model is ideal for Norwegian Bokmål speech transcription tasks, particularly in formal contexts like parliamentary speeches, presentations, and professional audio content. It's optimized for audio segments between 0.5 and 30 seconds in length.