BERTimbaULaw: Portuguese Legal Language Model
Property | Value |
---|---|
Base Model | neuralmind/bert-base-portuguese-cased |
Final Loss | 0.6440 |
Training Epochs | 15.0 |
Paper | JurisBERT Paper |
What is bertimbaulaw-base-portuguese-cased?
BERTimbaULaw is a specialized Portuguese language model fine-tuned for legal domain applications. Built upon the neuralmind/bert-base-portuguese-cased architecture, this model represents a significant advancement in Portuguese legal text processing, achieving a final validation loss of 0.6440.
Implementation Details
The model was trained using a sophisticated approach with the following key specifications: Adam optimizer with carefully tuned parameters (betas=0.9,0.999), linear learning rate scheduling with 10,000 warmup steps, and native AMP mixed precision training. The training process involved 170,000 steps across 15 epochs, with a batch size of 128.
- Learning rate: 0.0001
- Batch size: 16 (128 with gradient accumulation)
- Training framework: Transformers 4.12.5 with PyTorch 1.10.1
- Gradient accumulation steps: 8
Core Capabilities
- Specialized in Portuguese legal text processing
- Optimized for cased text input
- Demonstrated consistent performance improvement during training
- Suitable for legal document analysis and classification
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in Portuguese legal text processing, utilizing a novel approach that converts classification corpus into STS (Semantic Textual Similarity) data, as detailed in the JurisBERT paper.
Q: What are the recommended use cases?
The model is particularly suited for legal document processing, text classification, and semantic analysis within the Portuguese legal domain. It's optimized for tasks requiring understanding of formal legal language and terminology.