whisperkit-coreml_01-30-24

Maintained By
argmaxinc

WhisperKit Core ML

PropertyValue
Authorargmaxinc
Model TypeAutomatic Speech Recognition
Base Size3100 MB
Optimized Variants1307 MB, 1049 MB

What is whisperkit-coreml_01-30-24?

WhisperKit Core ML is a highly optimized implementation of OpenAI's Whisper model for Apple Silicon devices. It provides various model variants optimized for different size-performance trade-offs, with the base model achieving a 2.44% Word Error Rate (WER) on LibriSpeech test sets.

Implementation Details

The model comes in several variants, including a turbo-optimized version for streaming transcription and quantized versions for reduced file size. The implementation maintains high accuracy while offering significant size reductions through mixed-bit quantization techniques.

  • Base model: 3100MB with 2.44% WER
  • Turbo variant: 3100MB with 2.41% WER
  • Compressed variant: 1307MB with 2.6% WER
  • Ultra-compressed: 1049MB with 4.81% WER

Core Capabilities

  • High-accuracy speech recognition with 97-100% Quality of Inference (QoI)
  • Optimized performance on Apple Silicon
  • Streaming transcription support in turbo variants
  • Multiple size-optimized versions for different deployment scenarios

Frequently Asked Questions

Q: What makes this model unique?

WhisperKit Core ML stands out for its optimized performance on Apple Silicon devices and its variety of size-performance trade-offs, allowing developers to choose the best variant for their specific needs while maintaining high accuracy.

Q: What are the recommended use cases?

The model is ideal for iOS/macOS applications requiring high-quality speech recognition, particularly when streaming capability is needed. Different variants can be chosen based on storage constraints and accuracy requirements.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.