hubert-base-cc

hubert-base-cc

SZTAKI-HLT

Hungarian BERT model trained on Common Crawl and Wikipedia, achieving SOTA performance on NER (97.62%) and chunking tasks. Developed by SZTAKI-HLT.

PropertyValue
AuthorSZTAKI-HLT
Model TypeBERT (Cased)
LanguageHungarian
Training DataHungarian Common Crawl + Wikipedia
Best NER Score97.62%

What is hubert-base-cc?

huBERT-base-cc is a specialized BERT model designed specifically for Hungarian language processing. Developed by SZTAKI-HLT, this cased model represents a significant advancement in Hungarian NLP, trained on a carefully curated dataset combining filtered and deduplicated Hungarian content from Common Crawl and Wikipedia.

Implementation Details

The model follows the BERT base architecture and has been specifically optimized for Hungarian language understanding. It has undergone extensive training and validation, demonstrating exceptional performance particularly in token classification tasks.

  • Achieves state-of-the-art results on Hungarian NER (97.62%)
  • Excellent performance on chunking tasks (Minimal NP: 97.14%, Maximal NP: 96.97%)
  • Outperforms multilingual BERT on Hungarian tasks

Core Capabilities

  • Named Entity Recognition
  • Chunking (both minimal and maximal NP)
  • General Hungarian language understanding
  • Token classification tasks

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Hungarian language processing, offering superior performance compared to multilingual alternatives. It's trained on a comprehensive Hungarian dataset and has achieved state-of-the-art results on multiple benchmarks.

Q: What are the recommended use cases?

The model is particularly well-suited for token classification tasks, especially Named Entity Recognition and text chunking in Hungarian. It can be used like any other cased BERT model but with optimized performance for Hungarian language content.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026