KB-Whisper Large

Property	Value
Developer	KBLab (National Library of Sweden)
Model Type	Speech Recognition (ASR)
Training Data	50,000+ hours of Swedish speech
Model URL	https://huggingface.co/KBLab/kb-whisper-large

What is kb-whisper-large?

KB-Whisper Large is a state-of-the-art speech recognition model specifically optimized for Swedish language processing. Developed by the National Library of Sweden, it represents a significant improvement over OpenAI's Whisper models, achieving a 47% reduction in Word Error Rate (WER) across multiple evaluation datasets. The model was trained on an extensive dataset of over 50,000 hours of Swedish speech, making it particularly robust for Swedish language applications.

Implementation Details

The model was trained in two distinct stages: a continued pretraining phase using 56,514 hours of audio with lower quality thresholds, followed by a fine-tuning phase with 8,533 hours of high-quality audio data. The training data includes diverse sources such as subtitles, parliamentary speeches (Riksdag), and specialized Swedish speech corpora.

Supports multiple deployment formats: Hugging Face, whisper.cpp (GGML), ONNX, and ctranslate2
Compatible with popular frameworks like faster-whisper and WhisperX
Includes word-level timestamp capabilities when combined with wav2vec2

Core Capabilities

Superior Swedish speech recognition with 5.4% WER on FLEURS dataset
Efficient processing with multiple deployment options
Support for both CPU and GPU inference
Accurate word-level timestamp generation
Flexible integration options with major speech processing libraries

Frequently Asked Questions

Q: What makes this model unique?

KB-Whisper Large significantly outperforms OpenAI's Whisper models on Swedish speech recognition, with even the smaller variants (kb-whisper-small) outperforming OpenAI's larger models. It's specifically optimized for Swedish language processing and trained on an extensive Swedish speech dataset.

Q: What are the recommended use cases?

The model is ideal for Swedish speech transcription tasks, particularly in applications requiring high accuracy such as subtitle generation, parliamentary speech transcription, and general Swedish audio content processing. It can be deployed in various environments from browser-based applications to high-performance server setups.

kb-whisper-large