robbert-dutch-base-toxic-comments

Property	Value
Model Type	RoBERTa-based
Language	Dutch
Task	Toxic Comment Detection
Accuracy	95.63%
F1 Score	78.80%
Hugging Face	ml6team/robbert-dutch-base-toxic-comments

What is robbert-dutch-base-toxic-comments?

robbert-dutch-base-toxic-comments is a specialized natural language processing model designed to detect toxic and potentially harmful comments in Dutch text. Built upon the RobBERT architecture, this model was fine-tuned using a translated version of the Jigsaw Toxicity dataset, making it particularly effective for Dutch content moderation tasks.

Implementation Details

The model was trained for 2 epochs on 90% of the translated Jigsaw dataset, utilizing specific hyperparameters including a learning rate of 1e-5 and batch sizes of 8. The training process incorporated gradient accumulation steps of 6 and weight decay of 0.001, optimizing for recall as the primary metric.

Learning rate: 1e-5
Batch size: 8 (both training and evaluation)
Gradient accumulation steps: 6
Weight decay: 0.001
Evaluation strategy: Steps-based

Core Capabilities

Toxic comment detection in Dutch text
High accuracy (95.63%) on test dataset
Balanced performance metrics (F1: 78.80%, Recall: 78.99%, Precision: 78.61%)
Suitable for content moderation applications

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Dutch language toxic comment detection, utilizing a translated version of the Jigsaw dataset and achieving high accuracy while maintaining balanced precision and recall metrics.

Q: What are the recommended use cases?

The model is ideal for content moderation systems, online platforms, and applications requiring Dutch language toxic content detection. It can be integrated into automated moderation pipelines or used for content analysis.