robbert-dutch-base-toxic-comments
Property | Value |
---|---|
Model Type | RoBERTa-based |
Language | Dutch |
Task | Toxic Comment Detection |
Accuracy | 95.63% |
F1 Score | 78.80% |
Hugging Face | ml6team/robbert-dutch-base-toxic-comments |
What is robbert-dutch-base-toxic-comments?
robbert-dutch-base-toxic-comments is a specialized natural language processing model designed to detect toxic and potentially harmful comments in Dutch text. Built upon the RobBERT architecture, this model was fine-tuned using a translated version of the Jigsaw Toxicity dataset, making it particularly effective for Dutch content moderation tasks.
Implementation Details
The model was trained for 2 epochs on 90% of the translated Jigsaw dataset, utilizing specific hyperparameters including a learning rate of 1e-5 and batch sizes of 8. The training process incorporated gradient accumulation steps of 6 and weight decay of 0.001, optimizing for recall as the primary metric.
- Learning rate: 1e-5
- Batch size: 8 (both training and evaluation)
- Gradient accumulation steps: 6
- Weight decay: 0.001
- Evaluation strategy: Steps-based
Core Capabilities
- Toxic comment detection in Dutch text
- High accuracy (95.63%) on test dataset
- Balanced performance metrics (F1: 78.80%, Recall: 78.99%, Precision: 78.61%)
- Suitable for content moderation applications
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for Dutch language toxic comment detection, utilizing a translated version of the Jigsaw dataset and achieving high accuracy while maintaining balanced precision and recall metrics.
Q: What are the recommended use cases?
The model is ideal for content moderation systems, online platforms, and applications requiring Dutch language toxic content detection. It can be integrated into automated moderation pipelines or used for content analysis.