emotion-english-distilroberta-base
Property | Value |
---|---|
Author | j-hartmann |
Downloads | 1,020,570 |
Likes | 357 |
Base Architecture | DistilRoBERTa |
Evaluation Accuracy | 66% |
What is emotion-english-distilroberta-base?
This is a specialized emotion classification model built on the DistilRoBERTa architecture, designed to identify seven distinct emotions in English text: anger, disgust, fear, joy, neutral, sadness, and surprise. The model represents a significant advancement in emotion detection, trained on a carefully curated and balanced dataset of approximately 20,000 observations from diverse sources including Twitter, Reddit, student self-reports, and TV dialogues.
Implementation Details
The model is implemented using the Transformers architecture and can be easily deployed using PyTorch. It's built upon the DistilRoBERTa-base model, offering a more efficient, compressed version while maintaining robust performance. The training data includes 2,811 observations per emotion category, split 80/20 for training and evaluation.
- Simple integration with Hugging Face's pipeline API
- Supports batch processing for multiple examples
- Compatible with various text formats including CSV files
- Provides probability scores for all emotion categories
Core Capabilities
- Multi-class emotion classification across 7 categories
- 66% evaluation accuracy (vs. 14% random baseline)
- Processes both single texts and batch inputs
- Returns confidence scores for each emotion category
- Optimized for English language text analysis
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its comprehensive training on six diverse datasets, balanced representation across emotion categories, and its efficient architecture using DistilRoBERTa, making it suitable for production environments while maintaining high accuracy.
Q: What are the recommended use cases?
The model is ideal for sentiment analysis in social media monitoring, customer feedback analysis, content moderation, and research applications. It's particularly useful for analyzing Twitter and Reddit content, as demonstrated by its training data sources.