Celadon Toxicity Classifier

Property	Value
Parameter Count	141M
Model Type	DeBERTa-v3-small
Supported Languages	9 (EN, NL, ES, DE, PL, LA, IT, FR, PT)
Research Paper	Toxicity of the Commons
Training Data	600k samples from ToxicCommons

What is Celadon?

Celadon is a sophisticated multi-head toxicity classification model built on the DeBERTa-v3-small architecture. Named after a type of historical porcelain known for its protective properties, this model specializes in detecting five distinct dimensions of toxic content across nine different languages.

Implementation Details

The model implements five classification heads focusing on different aspects of toxicity classification, leveraging the DeBERTa-v3-small architecture with 141M parameters. It processes text input through a specialized tokenizer and outputs classification results across multiple dimensions simultaneously.

Multi-head architecture for five distinct toxicity categories
Trained on 600,000 samples from ToxicCommons dataset
Cross-lingual capability supporting 9 different languages
Implements state-of-the-art DeBERTa-v3 architecture

Core Capabilities

Race and origin-based bias detection
Gender and sexuality-based discrimination identification
Religious bias classification
Ability-based discrimination detection
Violence and abuse content identification

Frequently Asked Questions

Q: What makes this model unique?

Celadon's uniqueness lies in its comprehensive approach to toxicity detection across multiple dimensions and languages, combined with its efficient architecture and extensive training on carefully curated data.

Q: What are the recommended use cases?

The model is ideal for content moderation systems, social media platforms, and research applications requiring multi-dimensional toxicity analysis across multiple languages. It's particularly suited for applications requiring nuanced understanding of different types of harmful content.

celadon