wikineural-multilingual-ner

Maintained By
Babelscape

WikiNeural Multilingual NER

PropertyValue
Parameter Count177M
LicenseCC BY-NC-SA 4.0
Languages SupportedGerman, English, Spanish, French, Italian, Dutch, Polish, Portuguese, Russian
Downloads361,305
FrameworkPyTorch

What is wikineural-multilingual-ner?

WikiNeural Multilingual NER is a state-of-the-art named entity recognition model that supports 9 different languages. Built by Babelscape, it's based on mBERT and fine-tuned on the WikiNEuRal dataset, specifically designed to address the challenge of data scarcity in multilingual NER tasks.

Implementation Details

The model utilizes a multilingual BERT architecture fine-tuned for 3 epochs on the WikiNEuRal dataset. It combines both neural and knowledge-based approaches for silver data creation, resulting in high-quality training corpora for NER tasks.

  • Built on mBERT architecture with 177M parameters
  • Implements transformer-based token classification
  • Supports TensorBoard integration
  • Uses Safetensors for model storage

Core Capabilities

  • Multilingual NER across 9 European languages
  • Robust performance on encyclopedic content
  • Easy integration with Hugging Face transformers pipeline
  • Grouped entity recognition support

Frequently Asked Questions

Q: What makes this model unique?

This model combines neural and knowledge-based approaches for silver data creation, achieving up to 6 span-based F1-score points improvement over previous state-of-the-art systems. It's particularly effective for encyclopedic content while supporting multiple languages simultaneously.

Q: What are the recommended use cases?

The model is best suited for multilingual NER tasks, particularly with Wikipedia-style content. For optimal results in other domains, it's recommended to combine WikiNEuRal with domain-specific datasets like CoNLL.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.