DiarizationLM-8b-Fisher-v2

Maintained By
google

DiarizationLM-8b-Fisher-v2

PropertyValue
Parameter Count8.03B
LicenseLlama 3
Research PaperarXiv:2401.03506
Training DataFisher Corpus
Max Sequence Length4096 tokens

What is DiarizationLM-8b-Fisher-v2?

DiarizationLM-8b-Fisher-v2 is an advanced language model specifically designed for speaker diarization post-processing. Built on the Llama 3 architecture, this model represents a significant improvement over its predecessor (v1) by focusing the loss computation exclusively on completion tokens, resulting in enhanced performance for speaker attribution tasks.

Implementation Details

The model was finetuned on the Fisher corpus using a LoRA adapter with rank 256, involving 671,088,640 training parameters. Training was conducted over 28,800 steps (approximately 9 epochs) using a batch size of 16 on an NVIDIA A100 GPU. The model utilizes a 'mixed' flavor approach, combining data from 'hyp2ora' and 'deg2ref' variants, resulting in 51,063 prompt-completion pairs.

  • Maximum prompt length: 6000 characters
  • Training duration: 4+ days on A100 GPU
  • Tensor type: BF16
  • Architecture: Llama-based with LoRA adaptation

Core Capabilities

  • Improved WDER (Word Diarization Error Rate) performance compared to baseline
  • Fisher test set: 3.28% WDER (baseline: 5.32%)
  • Callhome test set: 6.66% WDER (baseline: 7.72%)
  • Efficient speaker attribution and diarization post-processing

Frequently Asked Questions

Q: What makes this model unique?

This model's unique feature is its completion-token-focused loss computation, distinguishing it from v1. This approach, combined with extensive training on the Fisher corpus, results in superior speaker diarization performance.

Q: What are the recommended use cases?

The model is specifically designed for post-processing speaker diarization tasks, particularly in scenarios requiring accurate speaker attribution in conversational transcripts. It's ideal for applications in speech recognition systems where speaker identification is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.