nerkor-cars-onpp-hubert

Maintained By
novakat

nerkor-cars-onpp-hubert

PropertyValue
Base ModelSZTAKI-HLT/hubert-base-cc
Max Sequence Length448 tokens
PaperNerKor+Cars-OntoNotes++
Authornovakat

What is nerkor-cars-onpp-hubert?

nerkor-cars-onpp-hubert is a specialized Hungarian named entity recognition model that builds upon the NYTK-NerKor corpus, significantly expanding its capabilities with over 30 entity types. The model is based on the HuBERT architecture and has been fine-tuned on an enhanced dataset that includes both traditional OntoNotes 5.0 categories and additional custom entity types specifically designed for Hungarian language processing.

Implementation Details

The model is trained on a comprehensive corpus containing approximately 1 million tokens from NYTK-NerKor, supplemented with 12,000 tokens of specialized automotive content from hvg.hu. It implements a sophisticated named entity recognition system that goes beyond the traditional CoNLL2002 four-category classification.

  • Built on SZTAKI-HLT/hubert-base-cc pretrained model
  • Supports sequence lengths up to 448 tokens
  • Incorporates all OntoNotes 5.0 entity types plus custom extensions
  • Features specialized automotive and media entity recognition capabilities

Core Capabilities

  • Recognizes standard entities (PER, ORG, LOC, GPE, etc.)
  • Handles temporal expressions (DATE, TIME, DUR)
  • Processes numerical entities (PERCENT, MONEY, QUANTITY)
  • Identifies specialized categories (CAR, MEDIA, SMEDIA)
  • Supports extended miscellaneous categories (AWARD, PROJ)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive coverage of entity types, combining standard OntoNotes 5.0 categories with specialized Hungarian-specific entities. It's particularly notable for its additional capability to recognize automotive-related entities and various media categories, making it highly versatile for Hungarian NLP tasks.

Q: What are the recommended use cases?

The model is ideal for advanced Hungarian text analysis tasks, particularly in contexts requiring detailed entity recognition. It's especially suitable for processing news content, automotive-related texts, and general Hungarian language documents requiring fine-grained entity classification.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.