ViSoBERT

Property	Value
Paper	EMNLP 2023
Architecture	XLM-R based
Task Type	Fill-Mask, Social Media Text Processing
Language	Vietnamese

What is visobert?

ViSoBERT is the first monolingual masked language model specifically designed for Vietnamese social media text processing. Published at EMNLP 2023, it represents a significant advancement in Vietnamese NLP, particularly for social media applications. Built on the XLM-R architecture, it's trained on a large-scale corpus of Vietnamese social media texts.

Implementation Details

The model utilizes the transformers library and requires SentencePiece for tokenization. It's implemented using PyTorch and follows the XLM-R architecture while being specifically optimized for Vietnamese language processing.

Pre-trained on diverse Vietnamese social media texts
Requires minimal dependencies (transformers and SentencePiece)
Supports both CPU and GPU inference

Core Capabilities

Emotion Recognition
Hate Speech Detection
Sentiment Analysis
Spam Reviews Detection
Hate Speech Spans Detection
Fill-Mask Task Support

Frequently Asked Questions

Q: What makes this model unique?

ViSoBERT is the first monolingual MLM specifically built for Vietnamese social media texts, outperforming previous monolingual, multilingual, and multilingual social media approaches on various downstream tasks.

Q: What are the recommended use cases?

The model is particularly suited for Vietnamese social media text analysis tasks including sentiment analysis, hate speech detection, emotion recognition, and spam detection. It's designed specifically for processing informal and social media content.

visobert