bert-base-multilingual-cased-finetuned-yoruba

bert-base-multilingual-cased-finetuned-yoruba

Davlan

A fine-tuned BERT model specialized for Yoruba language processing, achieving 82.58% F1 on MasakhaNER and 79.11% on text classification tasks.

PropertyValue
AuthorDavlan
Base Modelbert-base-multilingual-cased
LanguageYoruba
Training HardwareNVIDIA V100 GPU

What is bert-base-multilingual-cased-finetuned-yoruba?

This is a specialized BERT model fine-tuned specifically for the Yoruba language, built upon the bert-base-multilingual-cased architecture. It represents a significant advancement in African language processing, offering enhanced performance for Yoruba text analysis tasks compared to the standard multilingual BERT model.

Implementation Details

The model was trained on a diverse dataset including Bible texts, JW300, Menyo-20k, Yoruba Embedding corpus, CC-Aligned, Wikipedia, and various news sources including BBC Yoruba, VON Yoruba, Asejere, and Alaroye. Training was conducted on a single NVIDIA V100 GPU, focusing on optimizing performance for Yoruba language understanding.

  • Achieves 82.58% F1 score on MasakhaNER (improvement over mBERT's 78.97%)
  • Performs at 79.11% F1 score on BBC Yorùbá Text Classification (better than mBERT's 75.13%)
  • Supports masked token prediction through the Transformers pipeline

Core Capabilities

  • Named Entity Recognition in Yoruba text
  • Text Classification tasks
  • Masked Language Modeling
  • Context-aware token prediction

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Yoruba language processing, offering superior performance compared to general multilingual models. It's trained on a comprehensive collection of Yoruba texts from various sources, making it particularly effective for real-world applications.

Q: What are the recommended use cases?

The model is ideal for Named Entity Recognition, text classification, and general Yoruba language understanding tasks. It's particularly suitable for processing news content, religious texts, and general Yoruba language documents.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026