reverb-diarization-v2

Revai

Advanced speaker diarization model offering 22.25% better WDER than pyannote3.0, proven across 1.25M+ tokens with excellent performance in earnings calls

Property	Value
Author	Revai
Paper	arXiv:2410.03930
Best WDER	0.046 (earnings21 dataset)

What is reverb-diarization-v2?

Reverb-diarization-v2 is an advanced speaker diarization model developed by Revai that significantly improves upon existing solutions. It achieves a remarkable 22.25% relative improvement in Word Diarization Error Rate (WDER) compared to the baseline pyannote3.0 model. This improvement has been validated across an extensive test suite comprising over 1.25 million tokens.

Implementation Details

The model is implemented using the pyannote.audio framework and can be easily integrated into existing audio processing pipelines. It supports standard RTTM output format and requires proper authentication through HuggingFace's access token system.

Simple integration with pyannote.audio Pipeline
Support for multiple audio formats
RTTM format output capability
Authenticated access through HuggingFace

Core Capabilities

Superior performance with 0.046 WDER on earnings21 dataset
Robust performance across different test suites
Specialized optimization for earnings calls and professional audio
Streamlined deployment process

Frequently Asked Questions

Q: What makes this model unique?

The model's standout feature is its significant improvement in WDER, offering 22.25% better performance than the widely-used pyannote3.0 baseline. It's particularly effective for professional audio content like earnings calls.

Q: What are the recommended use cases?

The model excels in professional audio scenarios, particularly earnings calls and similar professional recordings, as evidenced by its impressive 0.046 WDER on the earnings21 dataset. It's ideal for applications requiring high-accuracy speaker diarization in professional settings.