DiarizationLM-8b-Fisher-v2

Maintained By
google

DiarizationLM-8b-Fisher-v2

PropertyValue
Parameter Count8.03B
LicenseLlama3
ArchitectureLlama-3 based
Research PaperView Paper
Tensor TypeBF16

What is DiarizationLM-8b-Fisher-v2?

DiarizationLM-8b-Fisher-v2 is an advanced language model specifically designed for speaker diarization post-processing. Built on the Llama-3 architecture, this model represents a significant improvement over its predecessor, focusing exclusively on completion tokens for loss computation. It was finetuned on the Fisher corpus using a LoRA adapter of rank 256, resulting in 671,088,640 training parameters.

Implementation Details

The model underwent intensive training for 28,800 steps (approximately 9 epochs) using a batch size of 16. Training was conducted on a Google Cloud VM instance with an NVIDIA A100 GPU, taking over 4 days to complete. The model can handle prompts up to 6,000 characters and has a maximum sequence length of 4,096 tokens.

  • Training utilized a mixed flavor approach combining hyp2ora and deg2ref data
  • 51,063 prompt-completion pairs in the training set
  • Implements LoRA adaptation with rank 256
  • Supports both Safetensors and GGUF formats

Core Capabilities

  • Achieves 3.28% WDER on Fisher testing set
  • Demonstrates 18.37% cpWER on Fisher testing set
  • Shows improved performance on Callhome testing set with 6.66% WDER
  • Efficiently processes speaker diarization tasks with high accuracy

Frequently Asked Questions

Q: What makes this model unique?

This model differs from its predecessor (v1) by computing loss only on completion tokens, leading to improved performance in speaker diarization tasks. It represents a specialized solution for post-processing speaker identification in conversational content.

Q: What are the recommended use cases?

The model is specifically designed for speaker diarization post-processing tasks, making it ideal for applications requiring accurate speaker identification and segmentation in conversational audio transcripts. It's particularly effective for processing multi-speaker conversations and improving diarization accuracy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.