nb-whisper-tiny-verbatim
Property | Value |
---|---|
Parameters | 39M |
Model Type | Whisper ASR |
License | Apache 2.0 |
Languages | Norwegian (Bokmål/Nynorsk), English |
Developer | National Library of Norway (NB AI-Lab) |
What is nb-whisper-tiny-verbatim?
nb-whisper-tiny-verbatim is a specialized Norwegian automatic speech recognition (ASR) model designed for verbatim transcription. Built upon OpenAI's Whisper architecture, this model has been fine-tuned with an additional 200 steps to produce lowercase text without punctuation, making it particularly suitable for linguistic analysis and exact transcription needs.
Implementation Details
The model is built on the tiny Whisper architecture with 39M parameters and trained on a diverse dataset of 8 million samples, totaling 66,000 hours of speech. It's optimized for CPU execution and can handle various Norwegian dialects and accents.
- Verbatim output format with no automatic correction
- Supports both Bokmål and Nynorsk transcription
- Includes English translation capabilities
- Optimized for 28-second audio chunks
Core Capabilities
- Accurate transcription of Norwegian speech
- Word-level and sentence-level timestamp generation
- Multi-dialect support
- Cross-lingual translation features
- Integration with popular frameworks (HuggingFace, WhisperX, Whisper.cpp)
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for verbatim transcription, maintaining the exact spoken words without automatic corrections or formatting, making it ideal for linguistic research and analysis where precise transcription is crucial.
Q: What are the recommended use cases?
The model is particularly well-suited for linguistic analysis, academic research, legal transcription, and any application requiring exact word-for-word transcription of Norwegian speech. It's optimized for CPU usage, making it accessible for users without specialized hardware.