WikiNeural Multilingual NER

Property	Value
Parameter Count	177M
License	CC BY-NC-SA 4.0
Languages Supported	German, English, Spanish, French, Italian, Dutch, Polish, Portuguese, Russian
Downloads	361,305
Framework	PyTorch

What is wikineural-multilingual-ner?

WikiNeural Multilingual NER is a state-of-the-art named entity recognition model that supports 9 different languages. Built by Babelscape, it's based on mBERT and fine-tuned on the WikiNEuRal dataset, specifically designed to address the challenge of data scarcity in multilingual NER tasks.

Implementation Details

The model utilizes a multilingual BERT architecture fine-tuned for 3 epochs on the WikiNEuRal dataset. It combines both neural and knowledge-based approaches for silver data creation, resulting in high-quality training corpora for NER tasks.

Built on mBERT architecture with 177M parameters
Implements transformer-based token classification
Supports TensorBoard integration
Uses Safetensors for model storage

Core Capabilities

Multilingual NER across 9 European languages
Robust performance on encyclopedic content
Easy integration with Hugging Face transformers pipeline
Grouped entity recognition support

Frequently Asked Questions

Q: What makes this model unique?

This model combines neural and knowledge-based approaches for silver data creation, achieving up to 6 span-based F1-score points improvement over previous state-of-the-art systems. It's particularly effective for encyclopedic content while supporting multiple languages simultaneously.

Q: What are the recommended use cases?

The model is best suited for multilingual NER tasks, particularly with Wikipedia-style content. For optimal results in other domains, it's recommended to combine WikiNEuRal with domain-specific datasets like CoNLL.