robertuito-ner
Property | Value |
---|---|
Author | pysentimiento |
Downloads | 135,753 |
Paper | View Paper |
Framework | PyTorch |
What is robertuito-ner?
robertuito-ner is a specialized Named Entity Recognition (NER) model designed for processing Spanish/English code-switched text, particularly from social media. Built on the RoBERTuito architecture, which is a RoBERTa model specifically trained on Spanish tweets, this model achieves impressive performance on the LinCE NER corpus benchmark with a 68.5% accuracy rate.
Implementation Details
The model is implemented using PyTorch and is integrated into the pysentimiento library. It's trained on the LinCE NER corpus, a code-switched benchmark dataset that specializes in Spanish-English mixed content. The model leverages the RoBERTuito base architecture, which has been pre-trained on Spanish Twitter data.
- Built on RoBERTuito architecture
- Trained on LinCE NER corpus
- Optimized for code-switched Spanish-English content
- Integrated with pysentimiento library
Core Capabilities
- Named Entity Recognition in Spanish and English tweets
- Handles code-switched content effectively
- Identifies various entity types including PER (Person) and LOC (Location)
- Provides entity position information (start/end indices)
Frequently Asked Questions
Q: What makes this model unique?
This model specializes in processing code-switched content (Spanish-English mix) in social media text, which is particularly challenging for traditional NER models. Its performance (68.5% accuracy) is competitive with larger models like XLM Large (69.5%) while being specifically optimized for Twitter content.
Q: What are the recommended use cases?
The model is ideal for applications requiring named entity recognition in Spanish social media content, especially where text might contain a mix of Spanish and English. Common use cases include social media monitoring, content analysis, and information extraction from tweets.