speaker-segmentation-fine-tuned-callhome-eng
Property | Value |
---|---|
Author | diarizers-community |
Base Model | pyannote/segmentation-3.0 |
Framework | PyTorch 2.2.2+cu121 |
Model URL | Hugging Face |
What is speaker-segmentation-fine-tuned-callhome-eng?
This is a specialized speaker segmentation model fine-tuned on the CallHome English dataset, building upon the pyannote/segmentation-3.0 architecture. The model achieves impressive performance metrics with a Diarization Error Rate (DER) of 18.28%, featuring low false alarm (5.84%) and missed detection (7.17%) rates.
Implementation Details
The model was trained using the Adam optimizer with a learning rate of 0.001 and cosine scheduler over 5 epochs. Training utilized batch sizes of 32 for both training and evaluation, achieving consistent performance improvements across iterations.
- Training Loss: Improved from 0.4123 to 0.3475 over 5 epochs
- Validation Loss: Final score of 0.4602
- Confusion Rate: Stabilized at 5.28%
Core Capabilities
- Accurate speaker segmentation in English audio
- Compatible with pyannote speaker diarization pipeline
- Efficient GPU and CPU deployment options
- Easy integration with existing audio processing workflows
Frequently Asked Questions
Q: What makes this model unique?
The model combines robust training on the CallHome English dataset with state-of-the-art segmentation architecture, achieving balanced performance across false alarms and missed detections, making it particularly suitable for real-world applications.
Q: What are the recommended use cases?
This model is ideal for English language audio processing tasks requiring speaker segmentation, particularly for telephone conversations or similar dialogue scenarios. It's especially effective when integrated into larger speaker diarization pipelines.