wav2vec2-xlsr-persian-speech-emotion-recognition

Property	Value
License	Apache 2.0
Author	m3hrdadfi
Downloads	47,405
Dataset	ShEMO

What is wav2vec2-xlsr-persian-speech-emotion-recognition?

This is a specialized speech emotion recognition model designed specifically for Persian (Farsi) language, built on the Wav2Vec 2.0 XLSR architecture. The model can identify six distinct emotions: Anger, Fear, Happiness, Neutral, Sadness, and Surprise, with an impressive overall accuracy of 90%.

Implementation Details

The model utilizes the Wav2Vec 2.0 architecture with XLSR (Cross-Lingual Speech Representations) adaptations for Persian speech. It processes audio input through a feature extractor and provides emotion classification probabilities as output. The implementation achieves particularly strong performance in detecting Anger (95% F1-score) and Neutral states (93% F1-score).

Built on PyTorch framework with Transformers integration
Includes custom feature extraction pipeline
Supports standard audio processing libraries (torchaudio, librosa)
Provides probability scores for each emotion category

Core Capabilities

Real-time emotion classification from Persian speech
High accuracy for anger detection (95% precision)
Robust neutral speech recognition (91% precision)
Support for multiple audio input formats
Easy integration with existing audio processing pipelines

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Persian speech emotion recognition, offering state-of-the-art performance across six emotional states. Its architecture leverages the power of Wav2Vec 2.0 while being adapted for Persian language characteristics.

Q: What are the recommended use cases?

The model is ideal for Persian speech analysis applications, including sentiment analysis systems, automated customer service evaluation, and emotional intelligence research. It's particularly effective in scenarios requiring accurate detection of anger and neutral emotional states.