visobert

Maintained By
uitnlp

ViSoBERT

PropertyValue
PaperEMNLP 2023
ArchitectureXLM-R based
Task TypeFill-Mask, Social Media Text Processing
LanguageVietnamese

What is visobert?

ViSoBERT is the first monolingual masked language model specifically designed for Vietnamese social media text processing. Published at EMNLP 2023, it represents a significant advancement in Vietnamese NLP, particularly for social media applications. Built on the XLM-R architecture, it's trained on a large-scale corpus of Vietnamese social media texts.

Implementation Details

The model utilizes the transformers library and requires SentencePiece for tokenization. It's implemented using PyTorch and follows the XLM-R architecture while being specifically optimized for Vietnamese language processing.

  • Pre-trained on diverse Vietnamese social media texts
  • Requires minimal dependencies (transformers and SentencePiece)
  • Supports both CPU and GPU inference

Core Capabilities

  • Emotion Recognition
  • Hate Speech Detection
  • Sentiment Analysis
  • Spam Reviews Detection
  • Hate Speech Spans Detection
  • Fill-Mask Task Support

Frequently Asked Questions

Q: What makes this model unique?

ViSoBERT is the first monolingual MLM specifically built for Vietnamese social media texts, outperforming previous monolingual, multilingual, and multilingual social media approaches on various downstream tasks.

Q: What are the recommended use cases?

The model is particularly suited for Vietnamese social media text analysis tasks including sentiment analysis, hate speech detection, emotion recognition, and spam detection. It's designed specifically for processing informal and social media content.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.