DiarizationLM-8b-Fisher-v2
Property | Value |
---|---|
Parameter Count | 8.03B |
License | Llama 3 |
Research Paper | arXiv:2401.03506 |
Training Data | Fisher Corpus |
Max Sequence Length | 4096 tokens |
What is DiarizationLM-8b-Fisher-v2?
DiarizationLM-8b-Fisher-v2 is an advanced language model specifically designed for speaker diarization post-processing. Built on the Llama 3 architecture, this model represents a significant improvement over its predecessor (v1) by focusing the loss computation exclusively on completion tokens, resulting in enhanced performance for speaker attribution tasks.
Implementation Details
The model was finetuned on the Fisher corpus using a LoRA adapter with rank 256, involving 671,088,640 training parameters. Training was conducted over 28,800 steps (approximately 9 epochs) using a batch size of 16 on an NVIDIA A100 GPU. The model utilizes a 'mixed' flavor approach, combining data from 'hyp2ora' and 'deg2ref' variants, resulting in 51,063 prompt-completion pairs.
- Maximum prompt length: 6000 characters
- Training duration: 4+ days on A100 GPU
- Tensor type: BF16
- Architecture: Llama-based with LoRA adaptation
Core Capabilities
- Improved WDER (Word Diarization Error Rate) performance compared to baseline
- Fisher test set: 3.28% WDER (baseline: 5.32%)
- Callhome test set: 6.66% WDER (baseline: 7.72%)
- Efficient speaker attribution and diarization post-processing
Frequently Asked Questions
Q: What makes this model unique?
This model's unique feature is its completion-token-focused loss computation, distinguishing it from v1. This approach, combined with extensive training on the Fisher corpus, results in superior speaker diarization performance.
Q: What are the recommended use cases?
The model is specifically designed for post-processing speaker diarization tasks, particularly in scenarios requiring accurate speaker attribution in conversational transcripts. It's ideal for applications in speech recognition systems where speaker identification is crucial.