GottBERT Base Last

Property	Value
Model Type	RoBERTa (German)
Parameter Count	125 million
Training Data	German OSCAR Dataset (121GB)
License	MIT
Author	TUM
Model Link	https://huggingface.co/TUM/GottBERT_base_last

What is GottBERT_base_last?

GottBERT_base_last is a pure German language model based on the RoBERTa architecture, specifically designed for German natural language processing tasks. It represents the last checkpoint of the base version, trained on the German portion of the OSCAR dataset. The model demonstrates strong performance across various NLP tasks, including Named Entity Recognition (NER), text classification, and natural language inference (NLI).

Implementation Details

The model features 12 layers with 125 million parameters and utilizes a GPT-2 Byte-Pair Encoding tokenizer with a 52k subword vocabulary. Training was conducted using the Fairseq framework on TPUv3/v4 pods, completing in 1.2 days with a batch size of 8k tokens and a peak learning rate of 0.0004.

Architecture: 12-layer transformer with 125M parameters
Training Infrastructure: 256 TPUv3 pod/128 TPUv4 pod
Training Duration: 1.2 days
Tokenizer: GPT-2 BPE with 52k vocabulary

Core Capabilities

Named Entity Recognition: Achieves 87.48% F1 score on GermEval 2014
Text Classification: 90.27% F1 score on 10kGNAD
Natural Language Inference: 81.04% accuracy on XNLI German subset
Optimized for German language tasks

Frequently Asked Questions

Q: What makes this model unique?

GottBERT is the first German-only RoBERTa model, specifically optimized for German language tasks. It differs from multilingual models by focusing exclusively on German language understanding and processing.

Q: What are the recommended use cases?

The model excels in German NLP tasks including Named Entity Recognition, text classification, and natural language inference. It's particularly suitable for applications requiring deep understanding of German text, such as content classification, information extraction, and semantic analysis.