roberta-base-cold
Property | Value |
---|---|
Parameter Count | 102M |
Model Type | Text Classification |
Architecture | RoBERTa (Chinese) |
Research Paper | Link |
Performance | 82.75% accuracy, 82.39% macro-F1 |
What is roberta-base-cold?
roberta-base-cold is a specialized Chinese language model fine-tuned for detecting offensive content in text. Based on the Chinese RoBERTa architecture, it's specifically optimized using the COLDataset to identify and classify offensive language in Chinese text with high accuracy.
Implementation Details
The model is built upon the hfl/chinese-roberta-wwm-ext architecture and implements PyTorch for processing. It uses the BertTokenizer and BertForSequenceClassification components, making it straightforward to integrate into existing NLP pipelines.
- Binary classification output (0 for Non-Offensive, 1 for Offensive)
- Supports batch processing with padding
- Implements PyTorch tensors for efficient computation
- Uses safetensors for model storage
Core Capabilities
- Chinese text classification for offensive content
- High accuracy (82.75%) in detecting offensive language
- Efficient processing with transformer architecture
- Support for batch inference
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in Chinese offensive language detection with state-of-the-art accuracy, making it particularly valuable for content moderation and social media analysis in Chinese-language contexts.
Q: What are the recommended use cases?
The model is ideal for content moderation systems, social media platforms, and research applications requiring Chinese text analysis for offensive content. It can be used for automated content filtering, research in online behavior, and social media monitoring.