Lite-Whisper Large-v3-turbo-acc
Property | Value |
---|---|
Encoder Size | 421M parameters |
Decoder Size | 172M parameters |
Average WER | 10.2% |
Paper | LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation |
What is lite-whisper-large-v3-turbo-acc?
Lite-Whisper large-v3-turbo-acc is an optimized version of OpenAI's Whisper model, developed by efficient-speech. It represents a significant achievement in model compression, maintaining near-identical performance to the original large-v3 model while reducing the encoder size by approximately 34%.
Implementation Details
The model employs LiteASR compression techniques to achieve efficient automatic speech recognition. It features a 421M parameter encoder paired with a 172M parameter decoder, offering a balanced compromise between model size and performance.
- Compressed encoder architecture maintaining 10.2% WER
- Turbo variant with optimized decoder size
- Compatible with HuggingFace Transformers library
- Supports 16-bit floating point operations
Core Capabilities
- Speech recognition with near-original model accuracy
- Efficient processing with reduced computational requirements
- Direct integration with popular audio processing libraries
- Support for various audio input formats through librosa
Frequently Asked Questions
Q: What makes this model unique?
This model achieves the remarkable feat of maintaining the same level of accuracy as the original Whisper large-v3 (10.2% WER vs 10.1%) while significantly reducing the model size through advanced compression techniques.
Q: What are the recommended use cases?
The model is ideal for applications requiring high-quality speech recognition while operating under computational constraints. It's particularly suitable for production environments where model efficiency is crucial without compromising accuracy.