nb-wav2vec2-1b-bokmaal

nb-wav2vec2-1b-bokmaal

NbAiLab

A powerful Norwegian ASR model with 963M parameters, achieving 6.33% WER on Bokmål speech recognition. Built on XLS-R architecture with KenLM integration.

PropertyValue
Parameter Count963M
LicenseApache 2.0
PaperResearch Paper
WER Score6.33% (with KenLM)
CER Score2.48% (with KenLM)

What is nb-wav2vec2-1b-bokmaal?

nb-wav2vec2-1b-bokmaal is a state-of-the-art Norwegian speech recognition model developed by NbAiLab. Built upon Facebook's XLS-R architecture, this model represents a significant advancement in Norwegian ASR technology, particularly for the Bokmål variant. The model was developed during the Hugging Face Robust Speech Event and achieves impressive accuracy with a 6.33% Word Error Rate (WER).

Implementation Details

The model is fine-tuned on the Norwegian Parliamentary Speech Corpus (NPSC), utilizing the wav2vec2-xls-r-1b architecture as its foundation. It implements a 5-gram KenLM language model for improved accuracy and features comprehensive dropout strategies for robust performance.

  • Base Architecture: wav2vec2-xls-r-1b
  • Training Duration: 40 epochs
  • Optimization: FP16 training with gradient checkpointing
  • Dropout Configuration: Layerdrop (0.041), Attention (0.094), Activation (0.055)

Core Capabilities

  • High-accuracy Bokmål speech recognition with 6.33% WER
  • Efficient processing of audio inputs between 0.5 and 30 seconds
  • Integrated language model support
  • Robust performance on parliamentary speech data

Frequently Asked Questions

Q: What makes this model unique?

This model represents the current state-of-the-art in Norwegian Bokmål speech recognition, achieving significantly better results than previous solutions. Its integration with KenLM and careful optimization make it particularly effective for real-world applications.

Q: What are the recommended use cases?

The model is ideal for Norwegian Bokmål speech transcription tasks, particularly in formal contexts like parliamentary speeches, presentations, and professional audio content. It's optimized for audio segments between 0.5 and 30 seconds in length.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026