bert-base-german-uncased

bert-base-german-uncased

dbmdz

German BERT model (111M params) trained on 16GB dataset with 2.35B tokens. Uncased version optimized for German NLP tasks with PyTorch compatibility.

PropertyValue
Parameter Count111M parameters
LicenseMIT
FrameworkPyTorch
Dataset Size16GB (2.35B tokens)
Authordbmdz (Bavarian State Library)

What is bert-base-german-uncased?

BERT Base German Uncased is a state-of-the-art language model developed by the MDZ Digital Library team at the Bavarian State Library. It's a transformer-based model specifically trained on a diverse German language corpus, offering robust language understanding capabilities for German text processing tasks.

Implementation Details

The model was trained on an extensive dataset combining Wikipedia dumps, EU Bookshop corpus, Open Subtitles, CommonCrawl, ParaCrawl, and News Crawl. The training process utilized spacy for sentence splitting and followed SciBERT's preprocessing methodology. The model was trained for 1.5M steps with a sequence length of 512 subwords.

  • Comprehensive vocabulary based on German text corpus
  • PyTorch compatibility through Hugging Face Transformers
  • Trained with state-of-the-art transformer architecture
  • Optimized for German language understanding

Core Capabilities

  • Fill-mask task performance
  • Text classification and token classification
  • Sequence classification tasks
  • Named Entity Recognition (NER)
  • Part of Speech (PoS) tagging

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its extensive training on diverse German language sources and its optimization for uncased text processing, making it particularly suitable for applications where case sensitivity isn't crucial.

Q: What are the recommended use cases?

The model is ideal for German language processing tasks including text classification, named entity recognition, and general language understanding applications. It's particularly useful in scenarios where case-insensitive text processing is preferred.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026