nb-wav2vec2-1b-nynorsk

nb-wav2vec2-1b-nynorsk

NbAiLab

Large-scale Norwegian ASR model (1B parameters) for Nynorsk dialect, achieving 11.32% WER. Built on XLS-R, trained on NPSC dataset.

PropertyValue
Model TypeSpeech Recognition (ASR)
Base ArchitectureWav2Vec2 XLS-R 1B
LanguageNorwegian (Nynorsk)
WER Score11.32% (with KenLM)
Research PaperarXiv:2307.01672

What is nb-wav2vec2-1b-nynorsk?

nb-wav2vec2-1b-nynorsk is a state-of-the-art Norwegian speech recognition model specifically trained for the Nynorsk written standard. Built by NbAiLab, it's based on Facebook/Meta's XLS-R architecture and has been fine-tuned on the Norwegian Parliamentary Speech Corpus (NPSC). The model demonstrates impressive performance with a Word Error Rate (WER) of 11.32% when used with a 5-gram KenLM language model.

Implementation Details

The model leverages the powerful XLS-R 1B architecture and was trained using specific hyperparameters optimized for Norwegian speech recognition. Training took 40 epochs with careful attention to dropout rates and masking probabilities. The implementation includes feature encoder freezing and gradient checkpointing for efficient training.

  • Learning rate: 2e-5 with 2000 warmup steps
  • Batch size: 12 with gradient accumulation steps of 2
  • Optimized dropout parameters for various layers
  • Feature masking with specific probabilities for time and feature dimensions

Core Capabilities

  • High-accuracy Nynorsk speech recognition with 11.32% WER
  • Character Error Rate (CER) of 4.02%
  • Handles audio durations between 0.5 and 30 seconds
  • Integrated language model support for improved accuracy
  • Optimized for Norwegian parliamentary speech

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Norwegian Nynorsk, one of Norway's two official written standards. It represents a significant improvement in Norwegian ASR technology, being part of a family of models that reduced error rates from 17.10% to as low as 7.60% for Norwegian speech recognition.

Q: What are the recommended use cases?

The model is particularly well-suited for transcribing formal Norwegian speech in the Nynorsk variant. It performs best on clear speech similar to parliamentary recordings but can be adapted for other use cases. For optimal results, it should be used with the recommended 5-gram KenLM language model.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026