Kotoba-Whisper v2.0
Property | Value |
---|---|
Parameter Count | 756M |
License | Apache 2.0 |
Language | Japanese |
Paper | Knowledge Distillation via Large-Scale Pseudo Labelling |
What is kotoba-whisper-v2.0?
Kotoba-Whisper v2.0 is a specialized Japanese speech recognition model developed through collaboration between Asahi Ushio and Kotoba Technologies. It's a distilled version of OpenAI's Whisper large-v3, designed specifically for Japanese ASR tasks, offering 6.3x faster performance while maintaining competitive accuracy.
Implementation Details
The model utilizes the full encoder from Whisper large-v3 combined with a streamlined two-layer decoder. It was trained on the ReazonSpeech dataset, comprising over 7.2 million audio clips, each averaging 5 seconds with 18 text tokens. Training was conducted over 8 epochs with a batch size of 256 and 16kHz sampling rate.
- Architecture: Modified Whisper with full encoder and reduced decoder
- Training Data: ReazonSpeech dataset with WER filtering
- Performance: 6.3x faster than Whisper large-v3
- Accuracy: Better CER/WER on in-domain tests compared to large-v3
Core Capabilities
- Efficient Japanese speech recognition with reduced latency
- Support for both short-form and long-form transcription
- Flash Attention 2 compatibility for improved performance
- Segment-level timestamp generation
- Batch processing support for long audio files
Frequently Asked Questions
Q: What makes this model unique?
The model combines the accuracy of Whisper large-v3 with significantly improved speed (6.3x faster) while specifically optimizing for Japanese language processing. It achieves this through careful architectural choices and specialized training on Japanese speech data.
Q: What are the recommended use cases?
The model is ideal for Japanese speech recognition tasks, particularly when processing speed is crucial. It's suitable for both short-form (< 30 seconds) and long-form audio transcription, with options for both sequential and chunked processing depending on accuracy vs. speed requirements.