robbert-v2-dutch-base

robbert-v2-dutch-base

pdelobelle

RobBERT v2 is a state-of-the-art Dutch language model with 117M parameters, trained on 39GB of Dutch text. Excels at NLP tasks like sentiment analysis and NER.

PropertyValue
Parameter Count117M
LicenseMIT
ArchitectureRoBERTa
Training Data39GB Dutch OSCAR corpus

What is robbert-v2-dutch-base?

RobBERT is a state-of-the-art Dutch language model based on RoBERTa architecture. Developed by researchers at KU Leuven, it's trained on 6.6 billion words from the Dutch section of the OSCAR corpus. This model represents a significant advancement in Dutch natural language processing, achieving superior performance across multiple tasks.

Implementation Details

The model implements the RoBERTa architecture with 12 self-attention layers and 12 heads, containing 117M trainable parameters. It was trained using the Adam optimizer with polynomial decay and specific hyperparameters (beta_1=0.9, beta_2=0.98). The training process utilized up to 80 GPUs across multiple computing nodes.

  • Pre-trained on masked language modeling task
  • Trained for two epochs on 126 million lines of text
  • Uses weight decay of 0.1 and dropout of 0.1

Core Capabilities

  • Sentiment Analysis (95.1% accuracy on Dutch Book Reviews)
  • Named Entity Recognition (89.08% accuracy)
  • Coreference Resolution (99.23% accuracy)
  • Part-of-Speech Tagging (96.4% accuracy)
  • Zero-shot word prediction
  • Emotion detection

Frequently Asked Questions

Q: What makes this model unique?

RobBERT v2 stands out for its exceptional performance on Dutch language tasks, particularly with small datasets. It's the first Dutch language model to achieve over 95% accuracy on sentiment analysis and demonstrates superior performance in gender-related linguistic tasks.

Q: What are the recommended use cases?

The model excels in various Dutch NLP tasks including sentiment analysis, coreference resolution, named entity recognition, and part-of-speech tagging. It's particularly recommended for scenarios with limited training data, where it significantly outperforms other models.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026