kb-whisper-large

Maintained By
KBLab

KB-Whisper Large

PropertyValue
DeveloperKBLab (National Library of Sweden)
Model TypeSpeech Recognition (ASR)
Training Data50,000+ hours of Swedish speech
Model URLhttps://huggingface.co/KBLab/kb-whisper-large

What is kb-whisper-large?

KB-Whisper Large is a state-of-the-art speech recognition model specifically optimized for Swedish language processing. Developed by the National Library of Sweden, it represents a significant improvement over OpenAI's Whisper models, achieving a 47% reduction in Word Error Rate (WER) across multiple evaluation datasets. The model was trained on an extensive dataset of over 50,000 hours of Swedish speech, making it particularly robust for Swedish language applications.

Implementation Details

The model was trained in two distinct stages: a continued pretraining phase using 56,514 hours of audio with lower quality thresholds, followed by a fine-tuning phase with 8,533 hours of high-quality audio data. The training data includes diverse sources such as subtitles, parliamentary speeches (Riksdag), and specialized Swedish speech corpora.

  • Supports multiple deployment formats: Hugging Face, whisper.cpp (GGML), ONNX, and ctranslate2
  • Compatible with popular frameworks like faster-whisper and WhisperX
  • Includes word-level timestamp capabilities when combined with wav2vec2

Core Capabilities

  • Superior Swedish speech recognition with 5.4% WER on FLEURS dataset
  • Efficient processing with multiple deployment options
  • Support for both CPU and GPU inference
  • Accurate word-level timestamp generation
  • Flexible integration options with major speech processing libraries

Frequently Asked Questions

Q: What makes this model unique?

KB-Whisper Large significantly outperforms OpenAI's Whisper models on Swedish speech recognition, with even the smaller variants (kb-whisper-small) outperforming OpenAI's larger models. It's specifically optimized for Swedish language processing and trained on an extensive Swedish speech dataset.

Q: What are the recommended use cases?

The model is ideal for Swedish speech transcription tasks, particularly in applications requiring high accuracy such as subtitle generation, parliamentary speech transcription, and general Swedish audio content processing. It can be deployed in various environments from browser-based applications to high-performance server setups.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.