bert-large-portuguese-cased-legal-mlm-nli-sts-v1

Maintained By
stjiris

Legal BERTimbau: Portuguese Legal Language Model

PropertyValue
Parameter Count334M
LicenseMIT
LanguagePortuguese
FrameworkPyTorch, Transformers
Primary TaskSentence Similarity

What is bert-large-portuguese-cased-legal-mlm-nli-sts-v1?

This is a specialized BERT model designed specifically for Portuguese legal text analysis. Built upon the BERTimbau architecture, it has been extensively trained on legal documents and optimized for semantic similarity tasks. The model maps sentences and paragraphs to a 1024-dimensional dense vector space, making it particularly effective for clustering and semantic search applications in legal contexts.

Implementation Details

The model underwent a comprehensive training process including MLM training on 30,000 legal documents with 15,000 training steps, followed by NLI training and fine-tuning for Semantic Textual Similarity using multiple datasets including ASSIN, ASSIN2, and STSB multi_mt.

  • Masked Language Model training with 1e-5 learning rate
  • NLI training with 16 batch size and 2e-5 learning rate
  • STS fine-tuning with specialized legal datasets
  • 1024-dimensional output embeddings

Core Capabilities

  • Semantic similarity computation for legal texts
  • Dense vector representation of legal documents
  • Support for both sentence-transformers and HuggingFace implementations
  • High performance on Portuguese legal document analysis

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Portuguese legal text analysis, combining the power of BERT architecture with domain-specific training on legal documents. It achieves impressive correlation scores on various benchmark datasets, with Pearson correlations ranging from 0.77 to 0.83.

Q: What are the recommended use cases?

The model is ideal for legal document analysis tasks including semantic search in legal databases, document similarity comparison, and legal text clustering. It's particularly well-suited for applications in Portuguese legal institutions and research.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.