bert-base-romanian-ner

Maintained By
dumitrescustefan

bert-base-romanian-ner

PropertyValue
LicenseMIT
PaperView Paper
Authordumitrescustefan
Downloads20,481

What is bert-base-romanian-ner?

bert-base-romanian-ner is a specialized BERT model fine-tuned for Named Entity Recognition (NER) in Romanian text. Built upon the bert-base-romanian-cased-v1 architecture, this model achieves state-of-the-art performance in identifying 15 different types of entities including persons, organizations, locations, and more. The model was trained on RONEC version 2.0, comprising 12,330 sentences with over 500,000 tokens and 80,283 distinct entity annotations.

Implementation Details

The model implements BIO2 annotation scheme and can be easily integrated using either the Transformers pipeline or the dedicated roner Python package. It processes text to identify entities with impressive accuracy metrics: 92.77% for entity type identification and 89.22% for strict matching.

  • Trained on RONEC v2.0 dataset with comprehensive entity coverage
  • Supports 15 distinct entity types with both Beginning (B-) and Inside (I-) labels
  • Achieves over 90% accuracy in entity recognition tasks
  • Handles complex Romanian language features

Core Capabilities

  • Named Entity Recognition for persons, organizations, and locations
  • Temporal expression recognition (datetime, period)
  • Quantitative entity detection (money, numeric, ordinal)
  • Cultural entity identification (works of art, events)
  • Geographical and political entity recognition

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Romanian language NER, offering comprehensive coverage of 15 entity types with state-of-the-art accuracy. It's one of the few models specifically trained on a large-scale Romanian named entity corpus.

Q: What are the recommended use cases?

The model is ideal for Romanian text analysis tasks including information extraction, document processing, and automated content analysis. It's particularly useful for applications requiring identification of names, organizations, dates, and quantities in Romanian text.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.