anime-whisper

anime-whisper

litagin

Japanese speech recognition model specialized for anime/game voices, fine-tuned on 5,300 hours of data. 756M params, achieves 13% CER on anime domain.

PropertyValue
Parameter Count756M parameters
Model TypeAutomatic Speech Recognition
LicenseMIT
Base Modelkotoba-tech/kotoba-whisper-v2.0

What is anime-whisper?

Anime Whisper is a specialized Japanese speech recognition model designed specifically for anime and game voice acting. Fine-tuned on over 5,300 hours of anime-style voice data comprising 3.7 million files, it achieves superior performance in transcribing emotional and expressive speech typical in anime content.

Implementation Details

Built on the kotoba-whisper-v2.0 architecture, this model was trained using a two-phase approach: first training only the decoder while freezing the encoder, then fine-tuning the entire model. The training process utilized an H100 NVL GPU over approximately 11.2 days.

  • Achieves 13% Character Error Rate (CER) on anime domain testing
  • Handles non-verbal expressions like laughs, sighs, and stutters
  • Appropriate punctuation placement based on speech rhythm
  • High accuracy for emotional and expressive speech

Core Capabilities

  • Accurate transcription of Japanese anime-style voice acting
  • Faithful reproduction of non-verbal utterances
  • Natural punctuation placement
  • Reduced hallucination compared to general models
  • Efficient processing with 756M parameters

Frequently Asked Questions

Q: What makes this model unique?

The model excels in handling anime-style speech patterns, emotional expressions, and non-verbal utterances that other models typically struggle with. It maintains high accuracy while being relatively lightweight compared to larger speech recognition models.

Q: What are the recommended use cases?

This model is ideal for transcribing anime content, visual novels, and game voice acting. It's particularly effective for content with emotional delivery and non-standard speech patterns. However, it should be used without initial prompts as they can cause performance degradation.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026