robbert-v2-dutch-base

Maintained By
pdelobelle

RobBERT v2 Dutch Base Model

PropertyValue
Parameter Count117M
LicenseMIT
ArchitectureRoBERTa
Training Data39GB Dutch OSCAR corpus

What is robbert-v2-dutch-base?

RobBERT is a state-of-the-art Dutch language model based on RoBERTa architecture. Developed by researchers at KU Leuven, it's trained on 6.6 billion words from the Dutch section of the OSCAR corpus. This model represents a significant advancement in Dutch natural language processing, achieving superior performance across multiple tasks.

Implementation Details

The model implements the RoBERTa architecture with 12 self-attention layers and 12 heads, containing 117M trainable parameters. It was trained using the Adam optimizer with polynomial decay and specific hyperparameters (beta_1=0.9, beta_2=0.98). The training process utilized up to 80 GPUs across multiple computing nodes.

  • Pre-trained on masked language modeling task
  • Trained for two epochs on 126 million lines of text
  • Uses weight decay of 0.1 and dropout of 0.1

Core Capabilities

  • Sentiment Analysis (95.1% accuracy on Dutch Book Reviews)
  • Named Entity Recognition (89.08% accuracy)
  • Coreference Resolution (99.23% accuracy)
  • Part-of-Speech Tagging (96.4% accuracy)
  • Zero-shot word prediction
  • Emotion detection

Frequently Asked Questions

Q: What makes this model unique?

RobBERT v2 stands out for its exceptional performance on Dutch language tasks, particularly with small datasets. It's the first Dutch language model to achieve over 95% accuracy on sentiment analysis and demonstrates superior performance in gender-related linguistic tasks.

Q: What are the recommended use cases?

The model excels in various Dutch NLP tasks including sentiment analysis, coreference resolution, named entity recognition, and part-of-speech tagging. It's particularly recommended for scenarios with limited training data, where it significantly outperforms other models.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.