nb-wav2vec2-1b-nynorsk

Maintained By
NbAiLab

nb-wav2vec2-1b-nynorsk

PropertyValue
Model TypeSpeech Recognition (ASR)
Base ArchitectureWav2Vec2 XLS-R 1B
LanguageNorwegian (Nynorsk)
WER Score11.32% (with KenLM)
Research PaperarXiv:2307.01672

What is nb-wav2vec2-1b-nynorsk?

nb-wav2vec2-1b-nynorsk is a state-of-the-art Norwegian speech recognition model specifically trained for the Nynorsk written standard. Built by NbAiLab, it's based on Facebook/Meta's XLS-R architecture and has been fine-tuned on the Norwegian Parliamentary Speech Corpus (NPSC). The model demonstrates impressive performance with a Word Error Rate (WER) of 11.32% when used with a 5-gram KenLM language model.

Implementation Details

The model leverages the powerful XLS-R 1B architecture and was trained using specific hyperparameters optimized for Norwegian speech recognition. Training took 40 epochs with careful attention to dropout rates and masking probabilities. The implementation includes feature encoder freezing and gradient checkpointing for efficient training.

  • Learning rate: 2e-5 with 2000 warmup steps
  • Batch size: 12 with gradient accumulation steps of 2
  • Optimized dropout parameters for various layers
  • Feature masking with specific probabilities for time and feature dimensions

Core Capabilities

  • High-accuracy Nynorsk speech recognition with 11.32% WER
  • Character Error Rate (CER) of 4.02%
  • Handles audio durations between 0.5 and 30 seconds
  • Integrated language model support for improved accuracy
  • Optimized for Norwegian parliamentary speech

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Norwegian Nynorsk, one of Norway's two official written standards. It represents a significant improvement in Norwegian ASR technology, being part of a family of models that reduced error rates from 17.10% to as low as 7.60% for Norwegian speech recognition.

Q: What are the recommended use cases?

The model is particularly well-suited for transcribing formal Norwegian speech in the Nynorsk variant. It performs best on clear speech similar to parliamentary recordings but can be adapted for other use cases. For optimal results, it should be used with the recommended 5-gram KenLM language model.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.