roberta-base-bne-capitel-ner-plus
Property | Value |
---|---|
License | Apache 2.0 |
Language | Spanish |
Task | Named Entity Recognition |
F1 Score | 89.60% |
Paper | RoBERTa Paper |
What is roberta-base-bne-capitel-ner-plus?
This is a specialized Spanish Named Entity Recognition (NER) model built upon RoBERTa architecture and trained on the largest Spanish corpus to date (570GB) from the National Library of Spain. It's specifically designed to excel at recognizing named entities in lowercase Spanish text, making it more robust than its predecessor.
Implementation Details
The model was fine-tuned using the CAPITEL competition dataset with specific optimizations for handling both upper and lowercase named entities. Training utilized a batch size of 16 and a learning rate of 5e-5 over 5 epochs, with checkpoint selection based on downstream task metrics.
- Pre-trained on 570GB of clean, deduplicated Spanish text
- Fine-tuned on CAPITEL-NERC dataset
- Optimized for recognizing lowercase named entities
- Achieves 89.60% F1 score on test set
Core Capabilities
- Named Entity Recognition in Spanish text
- Robust performance on lowercase text
- Easy integration with Transformers pipeline
- State-of-the-art performance compared to multilingual alternatives
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized ability to handle lowercase named entities in Spanish text, trained on an unprecedented volume of Spanish language data from the National Library of Spain.
Q: What are the recommended use cases?
The model is ideal for Named Entity Recognition tasks in Spanish text, particularly when dealing with informal text or content where proper capitalization may not be consistent. It's suitable for applications in information extraction, text analysis, and automated content processing.