legal-bert-dutch-english

Maintained By
Gerwin

legal-bert-dutch-english

PropertyValue
AuthorGerwin
Base ArchitecturemBERT
Training Data184k legal documents (295M words)
Model HubHugging Face

What is legal-bert-dutch-english?

Legal-bert-dutch-english is a specialized BERT model fine-tuned for processing legal documents in both Dutch and English languages. Built upon mBERT, this model has been further trained on a comprehensive dataset of 184,000 legal documents, including regulations, decisions, directives, and parliamentary questions in both languages. Despite using only 9% of BERT's original training data size, it demonstrates impressive performance in legal domain tasks.

Implementation Details

The model was trained for 60,000 steps, which empirically proved more effective than the 100,000 steps suggested in the original BERT paper. It can be easily implemented using the Hugging Face Transformers library in both PyTorch and TensorFlow frameworks.

  • Optimized training duration of 60k steps
  • Bilingual capability for Dutch and English legal texts
  • Seamless integration with popular deep learning frameworks

Core Capabilities

  • Legal topic classification with F1 scores of 0.786 for both Dutch and English
  • Multi-class classification of mixed language legal documents
  • Outperforms mBERT in legal document classification tasks
  • Effective handling of long legal documents in both languages

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to handle both Dutch and English legal documents within a single architecture, eliminating the need for separate language-specific models, makes it particularly unique. It achieves competitive performance compared to specialized legal BERT models while offering bilingual capabilities.

Q: What are the recommended use cases?

The model is particularly well-suited for legal document classification, topic modeling, and analysis of regulatory texts in both Dutch and English. It's especially valuable for organizations dealing with multilingual legal documentation, as demonstrated by its successful application in the Rabobank dataset classification task.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.