roberta-base-cold

Property	Value
Parameter Count	102M
Model Type	Text Classification
Architecture	RoBERTa (Chinese)
Research Paper	Link
Performance	82.75% accuracy, 82.39% macro-F1

What is roberta-base-cold?

roberta-base-cold is a specialized Chinese language model fine-tuned for detecting offensive content in text. Based on the Chinese RoBERTa architecture, it's specifically optimized using the COLDataset to identify and classify offensive language in Chinese text with high accuracy.

Implementation Details

The model is built upon the hfl/chinese-roberta-wwm-ext architecture and implements PyTorch for processing. It uses the BertTokenizer and BertForSequenceClassification components, making it straightforward to integrate into existing NLP pipelines.

Binary classification output (0 for Non-Offensive, 1 for Offensive)
Supports batch processing with padding
Implements PyTorch tensors for efficient computation
Uses safetensors for model storage

Core Capabilities

Chinese text classification for offensive content
High accuracy (82.75%) in detecting offensive language
Efficient processing with transformer architecture
Support for batch inference

Frequently Asked Questions

Q: What makes this model unique?

This model specializes in Chinese offensive language detection with state-of-the-art accuracy, making it particularly valuable for content moderation and social media analysis in Chinese-language contexts.

Q: What are the recommended use cases?

The model is ideal for content moderation systems, social media platforms, and research applications requiring Chinese text analysis for offensive content. It can be used for automated content filtering, research in online behavior, and social media monitoring.