DiarizationLM-8b-Fisher-v2
Property | Value |
---|---|
Parameter Count | 8.03B |
License | Llama3 |
Architecture | Llama-3 based |
Research Paper | View Paper |
Tensor Type | BF16 |
What is DiarizationLM-8b-Fisher-v2?
DiarizationLM-8b-Fisher-v2 is an advanced language model specifically designed for speaker diarization post-processing. Built on the Llama-3 architecture, this model represents a significant improvement over its predecessor, focusing exclusively on completion tokens for loss computation. It was finetuned on the Fisher corpus using a LoRA adapter of rank 256, resulting in 671,088,640 training parameters.
Implementation Details
The model underwent intensive training for 28,800 steps (approximately 9 epochs) using a batch size of 16. Training was conducted on a Google Cloud VM instance with an NVIDIA A100 GPU, taking over 4 days to complete. The model can handle prompts up to 6,000 characters and has a maximum sequence length of 4,096 tokens.
- Training utilized a mixed flavor approach combining hyp2ora and deg2ref data
- 51,063 prompt-completion pairs in the training set
- Implements LoRA adaptation with rank 256
- Supports both Safetensors and GGUF formats
Core Capabilities
- Achieves 3.28% WDER on Fisher testing set
- Demonstrates 18.37% cpWER on Fisher testing set
- Shows improved performance on Callhome testing set with 6.66% WDER
- Efficiently processes speaker diarization tasks with high accuracy
Frequently Asked Questions
Q: What makes this model unique?
This model differs from its predecessor (v1) by computing loss only on completion tokens, leading to improved performance in speaker diarization tasks. It represents a specialized solution for post-processing speaker identification in conversational content.
Q: What are the recommended use cases?
The model is specifically designed for speaker diarization post-processing tasks, making it ideal for applications requiring accurate speaker identification and segmentation in conversational audio transcripts. It's particularly effective for processing multi-speaker conversations and improving diarization accuracy.