bert-base-greek-uncased-v1

Property	Value
Parameters	110M
Architecture	12-layer, 768-hidden, 12-heads
Paper	GREEK-BERT: The Greeks visiting Sesame Street
Training Data	Greek Wikipedia, EU Parliament Proceedings, OSCAR

What is bert-base-greek-uncased-v1?

bert-base-greek-uncased-v1 is a specialized BERT model trained specifically for the Greek language, developed by AUEB's Natural Language Processing Group. This model represents a significant advancement in Greek language processing, trained on a diverse corpus including Wikipedia, European Parliament proceedings, and the Greek portion of OSCAR.

Implementation Details

The model follows the bert-base-uncased architecture but is specifically optimized for Greek text processing. It was trained for 1 million steps with batches of 256 sequences of length 512, using an initial learning rate of 1e-4 on a Google Cloud TPU v3-8.

Uncased model that works with lowercase, deaccented Greek text
Native preprocessing support in the default tokenizer
State-of-the-art performance on Greek NLP tasks

Core Capabilities

Named Entity Recognition (85.7% F1 score)
Natural Language Inference (78.6% accuracy on XNLI)
Masked Language Modeling for Greek text
Support for both PyTorch and TensorFlow 2 frameworks

Frequently Asked Questions

Q: What makes this model unique?

This is the first BERT model specifically trained for Greek language processing, showing superior performance compared to multilingual models like M-BERT and XLM-R on Greek NLP tasks.

Q: What are the recommended use cases?

The model excels in various Greek NLP tasks including named entity recognition, natural language inference, and can be fine-tuned for specific downstream tasks like text classification, question answering, and sentiment analysis in Greek.