emotion-diarization-wavlm-large

emotion-diarization-wavlm-large

speechbrain

WavLM-based emotion diarization model trained on 6 datasets, achieving 29.7% EDER. Identifies emotion segments in speech with temporal boundaries.

PropertyValue
LicenseApache 2.0
FrameworkPyTorch/SpeechBrain
PaperSpeech Emotion Diarization
Datasets6 (ZaionEmotionDataset, IEMOCAP, RAVDESS, JL-corpus, ESD, EMOV-DB)

What is emotion-diarization-wavlm-large?

This is a specialized speech emotion diarization model that leverages the WavLM large architecture to detect and temporally locate different emotional segments within speech recordings. The model achieves a 29.7% Emotion Diarization Error Rate (EDER) on the ZaionEmotionDataset test set, making it effective for practical applications in emotion analysis.

Implementation Details

The system combines a WavLM encoder with a frame-wise classifier to predict emotion components and their boundaries in speech recordings. It processes 16kHz single-channel audio and includes automatic normalization for input preprocessing.

  • Built on SpeechBrain framework for robust speech processing
  • Supports GPU inference for faster processing
  • Handles multiple emotion categories including neutral, happy, and sad
  • Provides temporal boundaries for emotion segments

Core Capabilities

  • Automatic emotion boundary detection in continuous speech
  • Multi-emotion classification within single audio files
  • Temporal segmentation with precise start and end times
  • Processing of various audio formats with automatic normalization

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to not just classify emotions but also identify their temporal boundaries within speech, trained on 6 diverse emotional datasets and achieving strong performance with a 29.7% EDER score.

Q: What are the recommended use cases?

The model is ideal for applications requiring detailed emotional analysis of speech, such as call center monitoring, mental health applications, or research in affective computing. It's particularly useful when temporal information about emotional changes is needed.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026