NVIDIA TitaNet-Large Speaker Verification Model

Property	Value
Parameter Count	23M
License	CC-BY-4.0
Language	English
Performance	0.66% EER on VoxCeleb1

What is speakerverification_en_titanet_large?

TitaNet-Large is a sophisticated speaker verification model developed by NVIDIA that uses depth-wise separable conv1D architecture. This model is designed to extract speaker embeddings from speech input, serving as a backbone for speaker verification and diarization tasks. With 23M parameters, it represents the "large" version of the TitaNet architecture family.

Implementation Details

The model is implemented using the NVIDIA NeMo toolkit and operates on 16kHz mono-channel audio input. It utilizes depth-wise separable convolutions and global context to generate speaker embeddings. The model was trained on an extensive dataset combination including VoxCeleb-1, VoxCeleb-2, Fisher, Switchboard, Librispeech, and SRE.

Achieves 0.66% EER on VoxCeleb1 cleaned trial file
Demonstrates strong diarization performance with DER as low as 1.19% on CH109 dataset
Supports both telephonic and non-telephonic speech processing

Core Capabilities

Speaker embedding extraction from audio files
Speaker verification between two utterances
Batch processing for multiple audio files
Support for speaker diarization tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's architecture combining depth-wise separable convolutions with global context, along with its extensive training on six diverse datasets, makes it particularly robust for speaker verification tasks. Its performance metrics, especially the 0.66% EER on VoxCeleb1, demonstrate state-of-the-art capabilities.

Q: What are the recommended use cases?

The model is ideal for speaker verification in security systems, speaker diarization in meeting transcriptions, and voice-based authentication systems. It performs well in both telephonic and non-telephonic environments, though fine-tuning might be necessary for specific domains.

speakerverification_en_titanet_large