Reverb ASR
Property | Value |
---|---|
Author | Revai |
Paper | arXiv:2410.03930 |
License | See LICENSE in repository |
What is reverb-asr?
Reverb ASR is a state-of-the-art English automatic speech recognition system trained on an unprecedented 200,000 hours of human-transcribed audio data. This model represents a significant breakthrough in ASR technology, offering industry-leading accuracy and unique features like adjustable verbatimicity control.
Implementation Details
The model implements a joint CTC/attention architecture and supports multiple decoding modes including attention, CTC greedy search, CTC prefix beam search, attention rescoring, and joint decoding. It can be run efficiently on both CPU and GPU hardware, making it versatile for various deployment scenarios.
- Built on modified WeNet architecture
- Supports flexible verbatimicity control (0-1 range)
- Multiple decoding options for optimal results
- Extensive benchmarking tools included
Core Capabilities
- Highly accurate English speech recognition
- Adjustable transcript verbatimicity (from clean to fully verbatim)
- Support for multiple decoding strategies
- Efficient processing on both CPU and GPU
- Handles disfluencies and speech variations
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its training on the largest corpus of human-transcribed audio ever used for an open-source model, combined with its adjustable verbatimicity feature that allows users to control the level of transcript detail.
Q: What are the recommended use cases?
The model is ideal for both clean, readable transcription needs and specialized use-cases like audio editing that require capturing every spoken word, including hesitations and re-wordings. The adjustable verbatimicity makes it versatile for various applications.