WhisperKit Core ML
Property | Value |
---|---|
Author | argmaxinc |
Model Type | Automatic Speech Recognition |
Base Size | 3100 MB |
Optimized Variants | 1307 MB, 1049 MB |
What is whisperkit-coreml_01-30-24?
WhisperKit Core ML is a highly optimized implementation of OpenAI's Whisper model for Apple Silicon devices. It provides various model variants optimized for different size-performance trade-offs, with the base model achieving a 2.44% Word Error Rate (WER) on LibriSpeech test sets.
Implementation Details
The model comes in several variants, including a turbo-optimized version for streaming transcription and quantized versions for reduced file size. The implementation maintains high accuracy while offering significant size reductions through mixed-bit quantization techniques.
- Base model: 3100MB with 2.44% WER
- Turbo variant: 3100MB with 2.41% WER
- Compressed variant: 1307MB with 2.6% WER
- Ultra-compressed: 1049MB with 4.81% WER
Core Capabilities
- High-accuracy speech recognition with 97-100% Quality of Inference (QoI)
- Optimized performance on Apple Silicon
- Streaming transcription support in turbo variants
- Multiple size-optimized versions for different deployment scenarios
Frequently Asked Questions
Q: What makes this model unique?
WhisperKit Core ML stands out for its optimized performance on Apple Silicon devices and its variety of size-performance trade-offs, allowing developers to choose the best variant for their specific needs while maintaining high accuracy.
Q: What are the recommended use cases?
The model is ideal for iOS/macOS applications requiring high-quality speech recognition, particularly when streaming capability is needed. Different variants can be chosen based on storage constraints and accuracy requirements.