Anime Whisper
Property | Value |
---|---|
Parameter Count | 756M parameters |
Model Type | Automatic Speech Recognition |
License | MIT |
Base Model | kotoba-tech/kotoba-whisper-v2.0 |
What is anime-whisper?
Anime Whisper is a specialized Japanese speech recognition model designed specifically for anime and game voice acting. Fine-tuned on over 5,300 hours of anime-style voice data comprising 3.7 million files, it achieves superior performance in transcribing emotional and expressive speech typical in anime content.
Implementation Details
Built on the kotoba-whisper-v2.0 architecture, this model was trained using a two-phase approach: first training only the decoder while freezing the encoder, then fine-tuning the entire model. The training process utilized an H100 NVL GPU over approximately 11.2 days.
- Achieves 13% Character Error Rate (CER) on anime domain testing
- Handles non-verbal expressions like laughs, sighs, and stutters
- Appropriate punctuation placement based on speech rhythm
- High accuracy for emotional and expressive speech
Core Capabilities
- Accurate transcription of Japanese anime-style voice acting
- Faithful reproduction of non-verbal utterances
- Natural punctuation placement
- Reduced hallucination compared to general models
- Efficient processing with 756M parameters
Frequently Asked Questions
Q: What makes this model unique?
The model excels in handling anime-style speech patterns, emotional expressions, and non-verbal utterances that other models typically struggle with. It maintains high accuracy while being relatively lightweight compared to larger speech recognition models.
Q: What are the recommended use cases?
This model is ideal for transcribing anime content, visual novels, and game voice acting. It's particularly effective for content with emotional delivery and non-standard speech patterns. However, it should be used without initial prompts as they can cause performance degradation.