Legal BERTimbau: Portuguese Legal Language Model

Property	Value
Parameter Count	334M
License	MIT
Language	Portuguese
Framework	PyTorch, Transformers
Primary Task	Sentence Similarity

What is bert-large-portuguese-cased-legal-mlm-nli-sts-v1?

This is a specialized BERT model designed specifically for Portuguese legal text analysis. Built upon the BERTimbau architecture, it has been extensively trained on legal documents and optimized for semantic similarity tasks. The model maps sentences and paragraphs to a 1024-dimensional dense vector space, making it particularly effective for clustering and semantic search applications in legal contexts.

Implementation Details

The model underwent a comprehensive training process including MLM training on 30,000 legal documents with 15,000 training steps, followed by NLI training and fine-tuning for Semantic Textual Similarity using multiple datasets including ASSIN, ASSIN2, and STSB multi_mt.

Masked Language Model training with 1e-5 learning rate
NLI training with 16 batch size and 2e-5 learning rate
STS fine-tuning with specialized legal datasets
1024-dimensional output embeddings

Core Capabilities

Semantic similarity computation for legal texts
Dense vector representation of legal documents
Support for both sentence-transformers and HuggingFace implementations
High performance on Portuguese legal document analysis

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Portuguese legal text analysis, combining the power of BERT architecture with domain-specific training on legal documents. It achieves impressive correlation scores on various benchmark datasets, with Pearson correlations ranging from 0.77 to 0.83.

Q: What are the recommended use cases?

The model is ideal for legal document analysis tasks including semantic search in legal databases, document similarity comparison, and legal text clustering. It's particularly well-suited for applications in Portuguese legal institutions and research.

bert-large-portuguese-cased-legal-mlm-nli-sts-v1