bert-base-multilingual-cased-masakhaner

Property	Value
Developer	Davlan
Model Base	mBERT (bert-base-multilingual-cased)
Task	Named Entity Recognition
Languages	9 African Languages
Paper	MasakhaNER Paper

What is bert-base-multilingual-cased-masakhaner?

This is a groundbreaking Named Entity Recognition (NER) model specifically designed for African languages. Built upon the mBERT architecture, it has been fine-tuned on the MasakhaNER dataset to recognize four entity types (DATE, LOC, ORG, PER) across 9 African languages including Hausa, Igbo, Kinyarwanda, Luganda, Nigerian Pidgin, Swahili, Wolof, and Yorùbá.

Implementation Details

The model was trained on a single NVIDIA V100 GPU using hyperparameters recommended in the original MasakhaNER paper. It achieves impressive F1-scores ranging from 66.27% (Wolof) to 88.96% (Nigerian Pidgin), representing state-of-the-art performance for African language NER.

Fine-tuned mBERT base model
Supports 9 African languages
Trained on MasakhaNER dataset
Uses BIO tagging scheme for entity identification

Core Capabilities

Recognition of person names (PER)
Detection of location entities (LOC)
Identification of organization names (ORG)
Recognition of date and time expressions (DATE)
Support for multilingual input

Frequently Asked Questions

Q: What makes this model unique?

This is the first comprehensive NER model specifically designed for African languages, offering state-of-the-art performance across 9 different languages. Its ability to handle multiple African languages makes it a crucial tool for African language processing.

Q: What are the recommended use cases?

The model is particularly suited for news article analysis, information extraction, and text processing applications in African languages. It's designed to work with entity-annotated text and can distinguish between consecutive entities of the same type using the BIO tagging scheme.