ViSoBERT
Property | Value |
---|---|
Paper | EMNLP 2023 |
Architecture | XLM-R based |
Task Type | Fill-Mask, Social Media Text Processing |
Language | Vietnamese |
What is visobert?
ViSoBERT is the first monolingual masked language model specifically designed for Vietnamese social media text processing. Published at EMNLP 2023, it represents a significant advancement in Vietnamese NLP, particularly for social media applications. Built on the XLM-R architecture, it's trained on a large-scale corpus of Vietnamese social media texts.
Implementation Details
The model utilizes the transformers library and requires SentencePiece for tokenization. It's implemented using PyTorch and follows the XLM-R architecture while being specifically optimized for Vietnamese language processing.
- Pre-trained on diverse Vietnamese social media texts
- Requires minimal dependencies (transformers and SentencePiece)
- Supports both CPU and GPU inference
Core Capabilities
- Emotion Recognition
- Hate Speech Detection
- Sentiment Analysis
- Spam Reviews Detection
- Hate Speech Spans Detection
- Fill-Mask Task Support
Frequently Asked Questions
Q: What makes this model unique?
ViSoBERT is the first monolingual MLM specifically built for Vietnamese social media texts, outperforming previous monolingual, multilingual, and multilingual social media approaches on various downstream tasks.
Q: What are the recommended use cases?
The model is particularly suited for Vietnamese social media text analysis tasks including sentiment analysis, hate speech detection, emotion recognition, and spam detection. It's designed specifically for processing informal and social media content.