kb-whisper-large

kb-whisper-large

KBLab

KB-Whisper Large - A Swedish-optimized speech recognition model achieving 47% WER reduction compared to OpenAI Whisper, trained on 50,000+ hours of Swedish speech

PropertyValue
DeveloperKBLab (National Library of Sweden)
Model TypeSpeech Recognition (ASR)
Training Data50,000+ hours of Swedish speech
Model URLhttps://huggingface.co/KBLab/kb-whisper-large

What is kb-whisper-large?

KB-Whisper Large is a state-of-the-art speech recognition model specifically optimized for Swedish language processing. Developed by the National Library of Sweden, it represents a significant improvement over OpenAI's Whisper models, achieving a 47% reduction in Word Error Rate (WER) across multiple evaluation datasets. The model was trained on an extensive dataset of over 50,000 hours of Swedish speech, making it particularly robust for Swedish language applications.

Implementation Details

The model was trained in two distinct stages: a continued pretraining phase using 56,514 hours of audio with lower quality thresholds, followed by a fine-tuning phase with 8,533 hours of high-quality audio data. The training data includes diverse sources such as subtitles, parliamentary speeches (Riksdag), and specialized Swedish speech corpora.

  • Supports multiple deployment formats: Hugging Face, whisper.cpp (GGML), ONNX, and ctranslate2
  • Compatible with popular frameworks like faster-whisper and WhisperX
  • Includes word-level timestamp capabilities when combined with wav2vec2

Core Capabilities

  • Superior Swedish speech recognition with 5.4% WER on FLEURS dataset
  • Efficient processing with multiple deployment options
  • Support for both CPU and GPU inference
  • Accurate word-level timestamp generation
  • Flexible integration options with major speech processing libraries

Frequently Asked Questions

Q: What makes this model unique?

KB-Whisper Large significantly outperforms OpenAI's Whisper models on Swedish speech recognition, with even the smaller variants (kb-whisper-small) outperforming OpenAI's larger models. It's specifically optimized for Swedish language processing and trained on an extensive Swedish speech dataset.

Q: What are the recommended use cases?

The model is ideal for Swedish speech transcription tasks, particularly in applications requiring high accuracy such as subtitle generation, parliamentary speech transcription, and general Swedish audio content processing. It can be deployed in various environments from browser-based applications to high-performance server setups.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026