legal-bert-small-uncased

legal-bert-small-uncased

nlpaueb

Legal domain-specific BERT model (small version) trained on 12GB of legal texts, offering 33% size of BERT-BASE with comparable performance and 4x faster inference.

PropertyValue
LicenseCC-BY-SA 4.0
Authornlpaueb
PaperLEGAL-BERT: The Muppets straight out of Law School (EMNLP 2020)
TaskFill-Mask

What is legal-bert-small-uncased?

legal-bert-small-uncased is a lightweight variant of LEGAL-BERT, specifically designed for legal domain natural language processing tasks. This model is trained on an extensive collection of 12GB of legal texts, including legislation, court cases, and contracts from various jurisdictions. What makes it particularly notable is its efficiency - it maintains comparable performance to larger models while being only 33% the size of BERT-BASE and approximately 4 times faster in execution.

Implementation Details

The model was trained on a diverse corpus of legal documents including EU legislation, UK legislation, European Court of Justice cases, ECHR cases, US court cases, and US contracts. It utilizes the same architecture as BERT but with reduced parameters for improved efficiency.

  • Training Data: Over 450,000 legal documents across multiple jurisdictions
  • Training Infrastructure: Google Cloud TPU v3-8
  • Pre-training Approach: 1 million training steps with 256 sequence batches
  • Optimization: Initial learning rate of 1e-4

Core Capabilities

  • Specialized legal domain understanding and vocabulary
  • Efficient processing with reduced parameter count
  • Strong performance on legal text masked language modeling
  • Multi-jurisdictional legal knowledge
  • Support for various legal NLP tasks

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its combination of efficiency and domain-specific expertise. It achieves comparable performance to larger legal language models while being significantly smaller and faster, making it ideal for resource-constrained environments.

Q: What are the recommended use cases?

The model is particularly well-suited for legal text analysis tasks including contract analysis, legal document classification, legal entity recognition, and legal text completion. It's especially valuable for applications requiring quick inference times while maintaining high accuracy in legal domain understanding.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026