Multilingual Sarcasm Detector
Property | Value |
---|---|
Base Model | bert-base-multilingual-uncased |
Languages | English, Dutch, Italian |
Task | Sarcasm Detection |
Performance | 87.23% F1 Score |
What is multilingual-sarcasm-detector?
The multilingual-sarcasm-detector is a sophisticated text classification model designed specifically for detecting sarcasm in news article headlines across multiple languages. Built upon the BERT multilingual architecture, this model has been fine-tuned using a diverse dataset comprising news headlines from English, Dutch, and Italian sources, including both traditional and satirical news outlets.
Implementation Details
The model leverages the bert-base-multilingual-uncased architecture and implements a binary classification approach, where 0 indicates non-sarcastic content and 1 indicates sarcastic content. The training data combines Kaggle datasets with manually scraped content from various news sources, including De Speld, Il Giornale, and Lercio.
- Preprocesses text by converting to lowercase and removing punctuation
- Supports maximum sequence length of 256 tokens
- Returns confidence scores along with predictions
- Implements PyTorch backend with Transformers library
Core Capabilities
- Multi-language sarcasm detection (English, Dutch, Italian)
- High accuracy across languages (88.30% overall accuracy)
- Real-time inference with confidence scoring
- Handles various news writing styles and cultural contexts
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its true multilingual capabilities, maintaining high performance across three different languages while achieving an F1 score of 87.23%. It's particularly notable for handling the cultural and linguistic nuances of sarcasm in different languages.
Q: What are the recommended use cases?
The model is ideal for news aggregators, content moderators, and media analysis tools that need to automatically detect satirical or sarcastic news content. It's particularly valuable for organizations working with multilingual content or requiring cross-language sarcasm detection.