roberta-base-bne-capitel-ner-plus

Property	Value
License	Apache 2.0
Language	Spanish
Task	Named Entity Recognition
F1 Score	89.60%
Paper	RoBERTa Paper

What is roberta-base-bne-capitel-ner-plus?

This is a specialized Spanish Named Entity Recognition (NER) model built upon RoBERTa architecture and trained on the largest Spanish corpus to date (570GB) from the National Library of Spain. It's specifically designed to excel at recognizing named entities in lowercase Spanish text, making it more robust than its predecessor.

Implementation Details

The model was fine-tuned using the CAPITEL competition dataset with specific optimizations for handling both upper and lowercase named entities. Training utilized a batch size of 16 and a learning rate of 5e-5 over 5 epochs, with checkpoint selection based on downstream task metrics.

Pre-trained on 570GB of clean, deduplicated Spanish text
Fine-tuned on CAPITEL-NERC dataset
Optimized for recognizing lowercase named entities
Achieves 89.60% F1 score on test set

Core Capabilities

Named Entity Recognition in Spanish text
Robust performance on lowercase text
Easy integration with Transformers pipeline
State-of-the-art performance compared to multilingual alternatives

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specialized ability to handle lowercase named entities in Spanish text, trained on an unprecedented volume of Spanish language data from the National Library of Spain.

Q: What are the recommended use cases?

The model is ideal for Named Entity Recognition tasks in Spanish text, particularly when dealing with informal text or content where proper capitalization may not be consistent. It's suitable for applications in information extraction, text analysis, and automated content processing.