Celadon Toxicity Classifier
Property | Value |
---|---|
Parameter Count | 141M |
Model Type | DeBERTa-v3-small |
Supported Languages | 9 (EN, NL, ES, DE, PL, LA, IT, FR, PT) |
Research Paper | Toxicity of the Commons |
Training Data | 600k samples from ToxicCommons |
What is Celadon?
Celadon is a sophisticated multi-head toxicity classification model built on the DeBERTa-v3-small architecture. Named after a type of historical porcelain known for its protective properties, this model specializes in detecting five distinct dimensions of toxic content across nine different languages.
Implementation Details
The model implements five classification heads focusing on different aspects of toxicity classification, leveraging the DeBERTa-v3-small architecture with 141M parameters. It processes text input through a specialized tokenizer and outputs classification results across multiple dimensions simultaneously.
- Multi-head architecture for five distinct toxicity categories
- Trained on 600,000 samples from ToxicCommons dataset
- Cross-lingual capability supporting 9 different languages
- Implements state-of-the-art DeBERTa-v3 architecture
Core Capabilities
- Race and origin-based bias detection
- Gender and sexuality-based discrimination identification
- Religious bias classification
- Ability-based discrimination detection
- Violence and abuse content identification
Frequently Asked Questions
Q: What makes this model unique?
Celadon's uniqueness lies in its comprehensive approach to toxicity detection across multiple dimensions and languages, combined with its efficient architecture and extensive training on carefully curated data.
Q: What are the recommended use cases?
The model is ideal for content moderation systems, social media platforms, and research applications requiring multi-dimensional toxicity analysis across multiple languages. It's particularly suited for applications requiring nuanced understanding of different types of harmful content.