postagger-portuguese

Maintained By
lisaterumi

postagger-portuguese

PropertyValue
Authorlisaterumi
F1-Score0.9826
Base ModelBERTimbau
Training DataMacMorpho corpus
Paper DOI10.59681/2175-4411.v15.iEspecial.2023.1086

What is postagger-portuguese?

postagger-portuguese is a state-of-the-art Part-of-Speech (POS) tagger specifically designed for the Portuguese language. Built by fine-tuning the BERTimbau model on the MacMorpho corpus, it achieves an impressive 98.26% F1-score in identifying 27 different grammatical categories in Portuguese text.

Implementation Details

The model was trained with carefully selected hyperparameters including 30 epochs, batch size of 32, and a learning rate of 1e-5. It can process sequences up to 200 tokens and implements early stopping after 3 epochs without improvement. The architecture leverages the powerful BERTimbau base model, fine-tuned specifically for morphological analysis.

  • 27 distinct POS tag classes
  • 98.26% accuracy on evaluation set
  • Optimized for clinical and general Portuguese text
  • Comprehensive tag set including specialized categories like ADV-KS-REL and PRO-KS

Core Capabilities

  • Advanced morphological analysis of Portuguese text
  • Identification of complex grammatical structures
  • Support for both clinical and general domain text
  • High-precision tagging of pronouns, verbs, and specialized linguistic elements

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional accuracy in Portuguese POS tagging, particularly in handling both clinical and general text. Its performance (98.26% F1-score) represents the state-of-the-art for the MacMorpho corpus, making it particularly valuable for Portuguese NLP applications.

Q: What are the recommended use cases?

The model is particularly well-suited for processing Electronic Health Records and clinical narratives in Portuguese, achieving 81.45% accuracy on clinical texts compared to 76.56% for generic models. It's also effective for general linguistic analysis, academic research, and any NLP pipeline requiring accurate Portuguese part-of-speech tagging.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.