roberta-base-bne-capitel-ner-plus

Maintained By
PlanTL-GOB-ES

roberta-base-bne-capitel-ner-plus

PropertyValue
LicenseApache 2.0
LanguageSpanish
TaskNamed Entity Recognition
F1 Score89.60%
PaperRoBERTa Paper

What is roberta-base-bne-capitel-ner-plus?

This is a specialized Spanish Named Entity Recognition (NER) model built upon RoBERTa architecture and trained on the largest Spanish corpus to date (570GB) from the National Library of Spain. It's specifically designed to excel at recognizing named entities in lowercase Spanish text, making it more robust than its predecessor.

Implementation Details

The model was fine-tuned using the CAPITEL competition dataset with specific optimizations for handling both upper and lowercase named entities. Training utilized a batch size of 16 and a learning rate of 5e-5 over 5 epochs, with checkpoint selection based on downstream task metrics.

  • Pre-trained on 570GB of clean, deduplicated Spanish text
  • Fine-tuned on CAPITEL-NERC dataset
  • Optimized for recognizing lowercase named entities
  • Achieves 89.60% F1 score on test set

Core Capabilities

  • Named Entity Recognition in Spanish text
  • Robust performance on lowercase text
  • Easy integration with Transformers pipeline
  • State-of-the-art performance compared to multilingual alternatives

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized ability to handle lowercase named entities in Spanish text, trained on an unprecedented volume of Spanish language data from the National Library of Spain.

Q: What are the recommended use cases?

The model is ideal for Named Entity Recognition tasks in Spanish text, particularly when dealing with informal text or content where proper capitalization may not be consistent. It's suitable for applications in information extraction, text analysis, and automated content processing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.