GottBERT Base Last
Property | Value |
---|---|
Model Type | RoBERTa (German) |
Parameter Count | 125 million |
Training Data | German OSCAR Dataset (121GB) |
License | MIT |
Author | TUM |
Model Link | https://huggingface.co/TUM/GottBERT_base_last |
What is GottBERT_base_last?
GottBERT_base_last is a pure German language model based on the RoBERTa architecture, specifically designed for German natural language processing tasks. It represents the last checkpoint of the base version, trained on the German portion of the OSCAR dataset. The model demonstrates strong performance across various NLP tasks, including Named Entity Recognition (NER), text classification, and natural language inference (NLI).
Implementation Details
The model features 12 layers with 125 million parameters and utilizes a GPT-2 Byte-Pair Encoding tokenizer with a 52k subword vocabulary. Training was conducted using the Fairseq framework on TPUv3/v4 pods, completing in 1.2 days with a batch size of 8k tokens and a peak learning rate of 0.0004.
- Architecture: 12-layer transformer with 125M parameters
- Training Infrastructure: 256 TPUv3 pod/128 TPUv4 pod
- Training Duration: 1.2 days
- Tokenizer: GPT-2 BPE with 52k vocabulary
Core Capabilities
- Named Entity Recognition: Achieves 87.48% F1 score on GermEval 2014
- Text Classification: 90.27% F1 score on 10kGNAD
- Natural Language Inference: 81.04% accuracy on XNLI German subset
- Optimized for German language tasks
Frequently Asked Questions
Q: What makes this model unique?
GottBERT is the first German-only RoBERTa model, specifically optimized for German language tasks. It differs from multilingual models by focusing exclusively on German language understanding and processing.
Q: What are the recommended use cases?
The model excels in German NLP tasks including Named Entity Recognition, text classification, and natural language inference. It's particularly suitable for applications requiring deep understanding of German text, such as content classification, information extraction, and semantic analysis.