WhisperKit Core ML

Property	Value
Author	argmaxinc
Model Type	Automatic Speech Recognition
Base Size	3100 MB
Optimized Variants	1307 MB, 1049 MB

What is whisperkit-coreml_01-30-24?

WhisperKit Core ML is a highly optimized implementation of OpenAI's Whisper model for Apple Silicon devices. It provides various model variants optimized for different size-performance trade-offs, with the base model achieving a 2.44% Word Error Rate (WER) on LibriSpeech test sets.

Implementation Details

The model comes in several variants, including a turbo-optimized version for streaming transcription and quantized versions for reduced file size. The implementation maintains high accuracy while offering significant size reductions through mixed-bit quantization techniques.

Base model: 3100MB with 2.44% WER
Turbo variant: 3100MB with 2.41% WER
Compressed variant: 1307MB with 2.6% WER
Ultra-compressed: 1049MB with 4.81% WER

Core Capabilities

High-accuracy speech recognition with 97-100% Quality of Inference (QoI)
Optimized performance on Apple Silicon
Streaming transcription support in turbo variants
Multiple size-optimized versions for different deployment scenarios

Frequently Asked Questions

Q: What makes this model unique?

WhisperKit Core ML stands out for its optimized performance on Apple Silicon devices and its variety of size-performance trade-offs, allowing developers to choose the best variant for their specific needs while maintaining high accuracy.

Q: What are the recommended use cases?

The model is ideal for iOS/macOS applications requiring high-quality speech recognition, particularly when streaming capability is needed. Different variants can be chosen based on storage constraints and accuracy requirements.