ner-bert-base-cased-pt-lenerbr

Property	Value
Task Type	Token Classification (NER)
Language	Portuguese
F1 Score	89.26%
Accuracy	97.59%
Downloads	108,927

What is ner-bert-base-cased-pt-lenerbr?

This is a specialized Named Entity Recognition (NER) model designed for the Portuguese legal domain. Built on BERT architecture, it was fine-tuned using the LeNER-Br dataset to identify and classify legal entities in Portuguese texts. The model demonstrates exceptional performance, particularly in recognizing person names (98.3% F1), temporal expressions (96.6% F1), and organizational entities (89.3% F1).

Implementation Details

The model was developed through a two-stage training process: first specializing the language model on legal texts, then fine-tuning for NER tasks. It uses a BERT base architecture with specialized tokenization for Portuguese legal terminology and achieves its results through careful hyperparameter optimization including a learning rate of 2e-5 and gradient accumulation steps of 2.

Trained on LeNER-Br dataset with legal domain focus
Implements transformer architecture with specialized Portuguese tokenization
Uses AdamW optimizer with linear learning rate scheduling
Trained for 10 epochs with batch size 4

Core Capabilities

Recognition of 6 entity types: JURISPRUDENCIA, LEGISLACAO, LOCAL, ORGANIZACAO, PESSOA, TEMPO
Optimal performance on person name recognition (98.7% precision)
Strong temporal expression detection (96.6% F1 score)
Efficient processing of legal documents with context awareness

Frequently Asked Questions

Q: What makes this model unique?

The model's specialization in Portuguese legal text sets it apart, with its two-stage training approach resulting in superior performance compared to standard BERT models. It achieves particularly high accuracy in person and temporal entity recognition, making it ideal for legal document processing.

Q: What are the recommended use cases?

The model is particularly suited for legal document analysis, court document processing, legal research automation, and any NLP tasks involving Portuguese legal texts. It excels at identifying legal references, organizations, and personal entities in legal contexts.