bert-base-multilingual-cased-masakhaner
Property | Value |
---|---|
Developer | Davlan |
Model Base | mBERT (bert-base-multilingual-cased) |
Task | Named Entity Recognition |
Languages | 9 African Languages |
Paper | MasakhaNER Paper |
What is bert-base-multilingual-cased-masakhaner?
This is a groundbreaking Named Entity Recognition (NER) model specifically designed for African languages. Built upon the mBERT architecture, it has been fine-tuned on the MasakhaNER dataset to recognize four entity types (DATE, LOC, ORG, PER) across 9 African languages including Hausa, Igbo, Kinyarwanda, Luganda, Nigerian Pidgin, Swahili, Wolof, and Yorùbá.
Implementation Details
The model was trained on a single NVIDIA V100 GPU using hyperparameters recommended in the original MasakhaNER paper. It achieves impressive F1-scores ranging from 66.27% (Wolof) to 88.96% (Nigerian Pidgin), representing state-of-the-art performance for African language NER.
- Fine-tuned mBERT base model
- Supports 9 African languages
- Trained on MasakhaNER dataset
- Uses BIO tagging scheme for entity identification
Core Capabilities
- Recognition of person names (PER)
- Detection of location entities (LOC)
- Identification of organization names (ORG)
- Recognition of date and time expressions (DATE)
- Support for multilingual input
Frequently Asked Questions
Q: What makes this model unique?
This is the first comprehensive NER model specifically designed for African languages, offering state-of-the-art performance across 9 different languages. Its ability to handle multiple African languages makes it a crucial tool for African language processing.
Q: What are the recommended use cases?
The model is particularly suited for news article analysis, information extraction, and text processing applications in African languages. It's designed to work with entity-annotated text and can distinguish between consecutive entities of the same type using the BIO tagging scheme.