bert-base-greek-uncased-v1
Property | Value |
---|---|
Parameters | 110M |
Architecture | 12-layer, 768-hidden, 12-heads |
Paper | GREEK-BERT: The Greeks visiting Sesame Street |
Training Data | Greek Wikipedia, EU Parliament Proceedings, OSCAR |
What is bert-base-greek-uncased-v1?
bert-base-greek-uncased-v1 is a specialized BERT model trained specifically for the Greek language, developed by AUEB's Natural Language Processing Group. This model represents a significant advancement in Greek language processing, trained on a diverse corpus including Wikipedia, European Parliament proceedings, and the Greek portion of OSCAR.
Implementation Details
The model follows the bert-base-uncased architecture but is specifically optimized for Greek text processing. It was trained for 1 million steps with batches of 256 sequences of length 512, using an initial learning rate of 1e-4 on a Google Cloud TPU v3-8.
- Uncased model that works with lowercase, deaccented Greek text
- Native preprocessing support in the default tokenizer
- State-of-the-art performance on Greek NLP tasks
Core Capabilities
- Named Entity Recognition (85.7% F1 score)
- Natural Language Inference (78.6% accuracy on XNLI)
- Masked Language Modeling for Greek text
- Support for both PyTorch and TensorFlow 2 frameworks
Frequently Asked Questions
Q: What makes this model unique?
This is the first BERT model specifically trained for Greek language processing, showing superior performance compared to multilingual models like M-BERT and XLM-R on Greek NLP tasks.
Q: What are the recommended use cases?
The model excels in various Greek NLP tasks including named entity recognition, natural language inference, and can be fine-tuned for specific downstream tasks like text classification, question answering, and sentiment analysis in Greek.