whisper-tiny

openai

Compact 39M-parameter speech recognition model supporting 99 languages, part of OpenAI's Whisper family. Ideal for lightweight ASR tasks.

Property	Value
Parameter Count	37.8M parameters
Model Type	Automatic Speech Recognition
License	Apache 2.0
Paper	View Paper

What is whisper-tiny?

Whisper-tiny is the most compact variant of OpenAI's Whisper family, designed for efficient automatic speech recognition and translation. As a transformer-based encoder-decoder model, it offers an impressive balance between performance and resource efficiency, supporting 99 languages while maintaining a relatively small footprint of 37.8M parameters.

Implementation Details

The model utilizes a sequence-to-sequence architecture trained on 680,000 hours of multilingual audio data. It processes audio by converting it to log-Mel spectrograms and can handle both transcription and translation tasks through specialized decoder prompts.

Supports both English-only and multilingual transcription
Handles audio chunks of up to 30 seconds
Includes timestamp prediction capabilities
Uses F32 tensor type for computations

Core Capabilities

Multilingual ASR supporting 99 languages
Speech-to-text transcription with 7.54% WER on LibriSpeech clean test
Speech translation to English
Long-form transcription through chunking
Robust performance across various accents and background noise conditions

Frequently Asked Questions

Q: What makes this model unique?

Whisper-tiny stands out for its exceptional efficiency-to-performance ratio, offering multilingual capabilities in a compact form factor. It's particularly notable for achieving reasonable accuracy while maintaining a small parameter count, making it suitable for deployment in resource-constrained environments.

Q: What are the recommended use cases?

The model is ideal for lightweight ASR applications, development and testing environments, and scenarios where resource efficiency is crucial. It's particularly well-suited for English transcription tasks, basic multilingual transcription, and prototyping speech recognition solutions.