visobert-14gb-corpus

Maintained By
5CD-AI

visobert-14gb-corpus

PropertyValue
Parameter Count97.6M
Model TypeFill-Mask Transformer
ArchitectureXLM-RoBERTa
Tensor TypeF32

What is visobert-14gb-corpus?

visobert-14gb-corpus is an advanced Vietnamese language model that builds upon the uitnlp/visobert architecture, pre-trained on a massive 14GB dataset comprising 100M Facebook comments, 15M Facebook posts, UIT data, and MC4 ecommerce content. This model represents a significant advancement in Vietnamese natural language processing, achieving state-of-the-art performance across multiple social media text analysis tasks.

Implementation Details

The model utilizes the transformers library and implements a fill-mask pipeline architecture. It has been fine-tuned with precise configurations including AdamW optimizer, 30 training epochs, and a maximum sequence length of 128 tokens. The training process employed various batch sizes and learning rate schedulers optimized for different downstream tasks.

  • Pre-trained on diverse Vietnamese social media content
  • Achieves 82.2% average Macro F1-score across tasks
  • Optimized for emotion recognition, hate speech detection, spam detection, and hate speech spans detection

Core Capabilities

  • Emotion Recognition: 68.69% accuracy
  • Hate Speech Detection: 88.79% accuracy
  • Spam Reviews Detection: 91.02% accuracy
  • Hate Speech Spans Detection: 93.69% accuracy

Frequently Asked Questions

Q: What makes this model unique?

This model distinguishes itself through its comprehensive training on a diverse 14GB Vietnamese corpus and superior performance across social media analysis tasks, consistently outperforming predecessors like PhoBERT and viBERT.

Q: What are the recommended use cases?

The model is particularly well-suited for Vietnamese social media text analysis, including emotion detection, content moderation, spam detection, and hate speech identification in social media platforms and e-commerce applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.