NuNER-multilingual-v0.1

NuNER-multilingual-v0.1

numind

Multilingual BERT-based model for entity recognition supporting 9 languages with SOTA performance, achieving 0.6231 F1 macro score with two-embedding approach.

PropertyValue
LicenseMIT
PaperarXiv:2402.15343
Supported LanguagesEnglish, French, German, Italian, Spanish, Portuguese, Polish, Dutch, Russian
Best F1 Score0.6231 (with two embeddings)

What is NuNER-multilingual-v0.1?

NuNER-multilingual-v0.1 is a state-of-the-art entity recognition model built on Multilingual BERT architecture. It's specifically fine-tuned on an artificially annotated multilingual subset of the OSCAR dataset, providing domain and language-independent embeddings for entity recognition tasks. The model demonstrates impressive performance improvements over the base mBERT model, achieving a 0.6231 F1 macro score with its innovative two-embedding approach.

Implementation Details

The model is implemented using the Transformers library and PyTorch, offering flexible deployment options for both inference and fine-tuning. It uniquely combines outputs from different layers of the transformer architecture to achieve superior performance.

  • Built on Multilingual BERT base architecture
  • Supports both single and two-embedding approaches
  • Implements token classification pipeline
  • Provides pre-trained weights optimized for entity recognition

Core Capabilities

  • Multilingual entity recognition across 9+ languages
  • Domain-independent embedding generation
  • Flexible integration with existing NLP pipelines
  • Support for both inference and fine-tuning workflows
  • Superior performance compared to base mBERT (0.5206 vs 0.6231 F1 score)

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to generate high-quality embeddings for entity recognition across multiple languages while maintaining domain independence. The innovative two-embedding approach significantly improves performance over traditional methods.

Q: What are the recommended use cases?

The model is ideal for multilingual entity recognition tasks, cross-lingual information extraction, and as a foundation for fine-tuning on specific domain entity recognition tasks. It's particularly valuable for applications requiring robust entity recognition across multiple languages.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026