BERT-Base-Uncased-EURLEX
Property | Value |
---|---|
Model Type | BERT |
Parameters | 110M |
Architecture | 12-layer, 768-hidden, 12-heads |
Paper | LEGAL-BERT: The Muppets straight out of Law School |
Training Data | 116,062 EU legislation documents |
What is bert-base-uncased-eurlex?
BERT-base-uncased-eurlex is a specialized BERT model designed specifically for processing European Union legal texts. It's part of the LEGAL-BERT family of models, developed by researchers at AUEB's Natural Language Processing Group. This variant has been pre-trained exclusively on EU legislation from EURLEX, making it particularly effective for tasks involving European legal documents.
Implementation Details
The model follows BERT's base architecture but is trained from scratch on legal texts. It was trained for 1 million steps with batches of 256 sequences of length 512, using an initial learning rate of 1e-4. The training was conducted on a Google Cloud TPU v3-8.
- Pre-trained exclusively on 116,062 EU legislation documents
- Implements BERT-base architecture (12-layer, 768-hidden, 12-heads)
- Optimized for legal domain vocabulary and context
- Trained using the official Google BERT codebase
Core Capabilities
- Superior performance on EU legislation-related NLP tasks
- Specialized legal vocabulary understanding
- Enhanced comprehension of legal terminology and context
- Effective for tasks like legal document classification and information extraction
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically trained on EU legislation, making it highly specialized for European legal text processing. Unlike general-purpose BERT models, it has learned the specific patterns and terminology used in EU legal documents.
Q: What are the recommended use cases?
The model is best suited for tasks involving EU legislation such as legal document classification, information extraction from legal texts, legal document search and retrieval, and automated legal analysis of EU documents.