DiarizationLM-8b-Fisher-v2

Property	Value
Parameter Count	8.03B
License	Llama 3
Research Paper	arXiv:2401.03506
Training Data	Fisher Corpus
Max Sequence Length	4096 tokens

What is DiarizationLM-8b-Fisher-v2?

DiarizationLM-8b-Fisher-v2 is an advanced language model specifically designed for speaker diarization post-processing. Built on the Llama 3 architecture, this model represents a significant improvement over its predecessor (v1) by focusing the loss computation exclusively on completion tokens, resulting in enhanced performance for speaker attribution tasks.

Implementation Details

The model was finetuned on the Fisher corpus using a LoRA adapter with rank 256, involving 671,088,640 training parameters. Training was conducted over 28,800 steps (approximately 9 epochs) using a batch size of 16 on an NVIDIA A100 GPU. The model utilizes a 'mixed' flavor approach, combining data from 'hyp2ora' and 'deg2ref' variants, resulting in 51,063 prompt-completion pairs.

Maximum prompt length: 6000 characters
Training duration: 4+ days on A100 GPU
Tensor type: BF16
Architecture: Llama-based with LoRA adaptation

Core Capabilities

Improved WDER (Word Diarization Error Rate) performance compared to baseline
Fisher test set: 3.28% WDER (baseline: 5.32%)
Callhome test set: 6.66% WDER (baseline: 7.72%)
Efficient speaker attribution and diarization post-processing

Frequently Asked Questions

Q: What makes this model unique?

This model's unique feature is its completion-token-focused loss computation, distinguishing it from v1. This approach, combined with extensive training on the Fisher corpus, results in superior speaker diarization performance.

Q: What are the recommended use cases?

The model is specifically designed for post-processing speaker diarization tasks, particularly in scenarios requiring accurate speaker attribution in conversational transcripts. It's ideal for applications in speech recognition systems where speaker identification is crucial.