Reverb ASR

Property	Value
Author	Revai
Paper	arXiv:2410.03930
License	See LICENSE in repository

What is reverb-asr?

Reverb ASR is a state-of-the-art English automatic speech recognition system trained on an unprecedented 200,000 hours of human-transcribed audio data. This model represents a significant breakthrough in ASR technology, offering industry-leading accuracy and unique features like adjustable verbatimicity control.

Implementation Details

The model implements a joint CTC/attention architecture and supports multiple decoding modes including attention, CTC greedy search, CTC prefix beam search, attention rescoring, and joint decoding. It can be run efficiently on both CPU and GPU hardware, making it versatile for various deployment scenarios.

Built on modified WeNet architecture
Supports flexible verbatimicity control (0-1 range)
Multiple decoding options for optimal results
Extensive benchmarking tools included

Core Capabilities

Highly accurate English speech recognition
Adjustable transcript verbatimicity (from clean to fully verbatim)
Support for multiple decoding strategies
Efficient processing on both CPU and GPU
Handles disfluencies and speech variations

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its training on the largest corpus of human-transcribed audio ever used for an open-source model, combined with its adjustable verbatimicity feature that allows users to control the level of transcript detail.

Q: What are the recommended use cases?

The model is ideal for both clean, readable transcription needs and specialized use-cases like audio editing that require capturing every spoken word, including hesitations and re-wordings. The adjustable verbatimicity makes it versatile for various applications.

reverb-asr