bert-base-arabert

bert-base-arabert

aubmindlab

AraBERT - Arabic BERT model with 136M parameters. Pre-trained on 77GB Arabic text. Supports both segmented and non-segmented text processing. Optimized for Arabic NLP tasks.

PropertyValue
Model Size543MB
Parameters136M
Training Data77M sentences (23GB)
ArchitectureBERT-Base
Authoraubmindlab

What is bert-base-arabert?

bert-base-arabert is a powerful Arabic language model based on Google's BERT architecture. It represents version 1 of the AraBERT family and is specifically designed to handle Arabic text processing with pre-segmented text using the Farasa Segmenter. The model was trained on a diverse corpus of 77 million sentences, encompassing 23GB of Arabic text data.

Implementation Details

The model implements the BERT-Base configuration and was trained using TPUv2-8 hardware for 1.2M steps. It features sophisticated preprocessing capabilities, particularly for handling Arabic text segmentation, which is crucial for improved performance on various NLP tasks.

  • Pre-segmented text processing using Farasa Segmenter
  • 136M parameters for robust language understanding
  • Trained on multiple Arabic datasets including Wikipedia, news articles, and web content
  • Supports both PyTorch and TensorFlow implementations

Core Capabilities

  • Sentiment Analysis across multiple datasets (HARD, ASTD-Balanced, ArsenTD-Lev, LABR)
  • Named Entity Recognition with ANERcorp
  • Arabic Question Answering (Arabic-SQuAD and ARCD)
  • Advanced text preprocessing and tokenization

Frequently Asked Questions

Q: What makes this model unique?

AraBERT stands out for its specialized Arabic text processing capabilities, particularly its pre-segmentation approach using the Farasa Segmenter. This makes it particularly effective for Arabic NLP tasks compared to multilingual models.

Q: What are the recommended use cases?

The model excels in various Arabic NLP tasks including sentiment analysis, named entity recognition, and question answering. It's particularly suitable for applications requiring deep understanding of Arabic text structure and semantics.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026