ner-bert-base-cased-pt-lenerbr

Maintained By
pierreguillou

ner-bert-base-cased-pt-lenerbr

PropertyValue
Task TypeToken Classification (NER)
LanguagePortuguese
F1 Score89.26%
Accuracy97.59%
Downloads108,927

What is ner-bert-base-cased-pt-lenerbr?

This is a specialized Named Entity Recognition (NER) model designed for the Portuguese legal domain. Built on BERT architecture, it was fine-tuned using the LeNER-Br dataset to identify and classify legal entities in Portuguese texts. The model demonstrates exceptional performance, particularly in recognizing person names (98.3% F1), temporal expressions (96.6% F1), and organizational entities (89.3% F1).

Implementation Details

The model was developed through a two-stage training process: first specializing the language model on legal texts, then fine-tuning for NER tasks. It uses a BERT base architecture with specialized tokenization for Portuguese legal terminology and achieves its results through careful hyperparameter optimization including a learning rate of 2e-5 and gradient accumulation steps of 2.

  • Trained on LeNER-Br dataset with legal domain focus
  • Implements transformer architecture with specialized Portuguese tokenization
  • Uses AdamW optimizer with linear learning rate scheduling
  • Trained for 10 epochs with batch size 4

Core Capabilities

  • Recognition of 6 entity types: JURISPRUDENCIA, LEGISLACAO, LOCAL, ORGANIZACAO, PESSOA, TEMPO
  • Optimal performance on person name recognition (98.7% precision)
  • Strong temporal expression detection (96.6% F1 score)
  • Efficient processing of legal documents with context awareness

Frequently Asked Questions

Q: What makes this model unique?

The model's specialization in Portuguese legal text sets it apart, with its two-stage training approach resulting in superior performance compared to standard BERT models. It achieves particularly high accuracy in person and temporal entity recognition, making it ideal for legal document processing.

Q: What are the recommended use cases?

The model is particularly suited for legal document analysis, court document processing, legal research automation, and any NLP tasks involving Portuguese legal texts. It excels at identifying legal references, organizations, and personal entities in legal contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.