NuNER-multilingual-v0.1

Maintained By
numind

NuNER-multilingual-v0.1

PropertyValue
LicenseMIT
PaperarXiv:2402.15343
Supported LanguagesEnglish, French, German, Italian, Spanish, Portuguese, Polish, Dutch, Russian
Best F1 Score0.6231 (with two embeddings)

What is NuNER-multilingual-v0.1?

NuNER-multilingual-v0.1 is a state-of-the-art entity recognition model built on Multilingual BERT architecture. It's specifically fine-tuned on an artificially annotated multilingual subset of the OSCAR dataset, providing domain and language-independent embeddings for entity recognition tasks. The model demonstrates impressive performance improvements over the base mBERT model, achieving a 0.6231 F1 macro score with its innovative two-embedding approach.

Implementation Details

The model is implemented using the Transformers library and PyTorch, offering flexible deployment options for both inference and fine-tuning. It uniquely combines outputs from different layers of the transformer architecture to achieve superior performance.

  • Built on Multilingual BERT base architecture
  • Supports both single and two-embedding approaches
  • Implements token classification pipeline
  • Provides pre-trained weights optimized for entity recognition

Core Capabilities

  • Multilingual entity recognition across 9+ languages
  • Domain-independent embedding generation
  • Flexible integration with existing NLP pipelines
  • Support for both inference and fine-tuning workflows
  • Superior performance compared to base mBERT (0.5206 vs 0.6231 F1 score)

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its ability to generate high-quality embeddings for entity recognition across multiple languages while maintaining domain independence. The innovative two-embedding approach significantly improves performance over traditional methods.

Q: What are the recommended use cases?

The model is ideal for multilingual entity recognition tasks, cross-lingual information extraction, and as a foundation for fine-tuning on specific domain entity recognition tasks. It's particularly valuable for applications requiring robust entity recognition across multiple languages.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.