BERT-Base-Uncased-EURLEX

Property	Value
Model Type	BERT
Parameters	110M
Architecture	12-layer, 768-hidden, 12-heads
Paper	LEGAL-BERT: The Muppets straight out of Law School
Training Data	116,062 EU legislation documents

What is bert-base-uncased-eurlex?

BERT-base-uncased-eurlex is a specialized BERT model designed specifically for processing European Union legal texts. It's part of the LEGAL-BERT family of models, developed by researchers at AUEB's Natural Language Processing Group. This variant has been pre-trained exclusively on EU legislation from EURLEX, making it particularly effective for tasks involving European legal documents.

Implementation Details

The model follows BERT's base architecture but is trained from scratch on legal texts. It was trained for 1 million steps with batches of 256 sequences of length 512, using an initial learning rate of 1e-4. The training was conducted on a Google Cloud TPU v3-8.

Pre-trained exclusively on 116,062 EU legislation documents
Implements BERT-base architecture (12-layer, 768-hidden, 12-heads)
Optimized for legal domain vocabulary and context
Trained using the official Google BERT codebase

Core Capabilities

Superior performance on EU legislation-related NLP tasks
Specialized legal vocabulary understanding
Enhanced comprehension of legal terminology and context
Effective for tasks like legal document classification and information extraction

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically trained on EU legislation, making it highly specialized for European legal text processing. Unlike general-purpose BERT models, it has learned the specific patterns and terminology used in EU legal documents.

Q: What are the recommended use cases?

The model is best suited for tasks involving EU legislation such as legal document classification, information extraction from legal texts, legal document search and retrieval, and automated legal analysis of EU documents.