nb-wav2vec2-1b-nynorsk

NbAiLab

Large-scale Norwegian ASR model (1B parameters) for Nynorsk dialect, achieving 11.32% WER. Built on XLS-R, trained on NPSC dataset.

Property	Value
Model Type	Speech Recognition (ASR)
Base Architecture	Wav2Vec2 XLS-R 1B
Language	Norwegian (Nynorsk)
WER Score	11.32% (with KenLM)
Research Paper	arXiv:2307.01672

What is nb-wav2vec2-1b-nynorsk?

nb-wav2vec2-1b-nynorsk is a state-of-the-art Norwegian speech recognition model specifically trained for the Nynorsk written standard. Built by NbAiLab, it's based on Facebook/Meta's XLS-R architecture and has been fine-tuned on the Norwegian Parliamentary Speech Corpus (NPSC). The model demonstrates impressive performance with a Word Error Rate (WER) of 11.32% when used with a 5-gram KenLM language model.

Implementation Details

The model leverages the powerful XLS-R 1B architecture and was trained using specific hyperparameters optimized for Norwegian speech recognition. Training took 40 epochs with careful attention to dropout rates and masking probabilities. The implementation includes feature encoder freezing and gradient checkpointing for efficient training.

Learning rate: 2e-5 with 2000 warmup steps
Batch size: 12 with gradient accumulation steps of 2
Optimized dropout parameters for various layers
Feature masking with specific probabilities for time and feature dimensions

Core Capabilities

High-accuracy Nynorsk speech recognition with 11.32% WER
Character Error Rate (CER) of 4.02%
Handles audio durations between 0.5 and 30 seconds
Integrated language model support for improved accuracy
Optimized for Norwegian parliamentary speech

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Norwegian Nynorsk, one of Norway's two official written standards. It represents a significant improvement in Norwegian ASR technology, being part of a family of models that reduced error rates from 17.10% to as low as 7.60% for Norwegian speech recognition.

Q: What are the recommended use cases?

The model is particularly well-suited for transcribing formal Norwegian speech in the Nynorsk variant. It performs best on clear speech similar to parliamentary recordings but can be adapted for other use cases. For optimal results, it should be used with the recommended 5-gram KenLM language model.