DarijaBERT

DarijaBERT

SI2M-Lab

DarijaBERT is a pioneering BERT model for Moroccan Arabic (Darija), trained on 3M sequences with 209M parameters, specializing in dialectal understanding.

PropertyValue
Parameter Count209M
Model TypeBERT-based
Training Data3M sequences (~100M tokens)
AuthorsSI2M-Lab

What is DarijaBERT?

DarijaBERT represents a groundbreaking achievement in NLP as the first BERT model specifically designed for the Moroccan Arabic dialect (Darija). Developed through a collaboration between AIOX Lab and SI2M Lab INSEA, this model addresses the crucial need for processing Moroccan dialectal language in natural language applications.

Implementation Details

The model follows BERT-base architecture but excludes the Next Sentence Prediction (NSP) objective. It was trained on a diverse dataset of 691MB of text, comprising approximately 100M tokens from three main sources: Darija stories, YouTube comments from 40 Moroccan channels, and tweets containing Darija keywords.

  • Architecture based on BERT-base without NSP
  • Training corpus of ~3 Million sequences
  • Supports PyTorch implementation
  • Utilizes Safetensors format

Core Capabilities

  • Fill-mask task operations for Darija text
  • Native support for Moroccan Arabic dialect processing
  • Seamless integration with Hugging Face's transformers library
  • Optimized for dialectal language understanding

Frequently Asked Questions

Q: What makes this model unique?

DarijaBERT is the first BERT model specifically trained for Moroccan Arabic dialect, making it a pioneer in dialectal Arabic NLP. Its training on diverse sources ensures robust understanding of colloquial Moroccan Arabic.

Q: What are the recommended use cases?

The model is ideal for tasks involving Moroccan Arabic text analysis, including masked language modeling, text classification, and general NLP tasks specific to Darija dialect. It's particularly useful for researchers and developers working with Moroccan Arabic content.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026