faster_CrisperWhisper
Property | Value |
---|---|
Author | nyrahealth |
License | cc-by-nc-4.0 |
Paper | Research Paper |
Downloads | 367 |
What is faster_CrisperWhisper?
faster_CrisperWhisper is a converted version of CrisperWhisper optimized for the faster-whisper framework, designed to provide precise speech recognition with verbatim transcription capabilities. This model stands out for its ability to capture every spoken word exactly as it is, including fillers, pauses, stutters, and false starts - elements that traditional speech recognition models often omit.
Implementation Details
The model utilizes Dynamic Time Warping (DTW) on Whisper cross-attention scores and implements a specialized attention loss function to enhance timestamp accuracy. It's trained in three stages, incorporating WavLM augmentations and specialized data preparation techniques to ensure robust performance.
- Implements custom attention loss for improved timestamp accuracy
- Uses DTW-based alignment for precise word-level timestamps
- Trained on a mixture of English and German datasets
- Incorporates noise augmentation for improved hallucination resistance
Core Capabilities
- Accurate word-level timestamp generation
- Verbatim transcription including fillers and disfluencies
- Sophisticated filler detection and transcription
- Hallucination mitigation through specialized training
- State-of-the-art performance on verbatim datasets (AMI, TED)
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to provide verbatim transcriptions with accurate timestamps, particularly around disfluencies and pauses, sets it apart. It achieves this through custom tokenization and specialized attention loss training.
Q: What are the recommended use cases?
This model is ideal for applications requiring exact transcriptions with precise timing, such as subtitle generation, linguistic research, or any scenario where capturing speech patterns, including fillers and disfluencies, is important.