gpt2-small-arabic

gpt2-small-arabic

akhooli

Arabic GPT2-small model trained on Wikipedia data (900MB). Achieves 72.19 perplexity. Suitable for text & poetry generation demos.

PropertyValue
Authorakhooli
Training DataArabic Wikipedia (900MB)
FrameworkFastai2
Model BaseGPT2-small
Performance MetricsPerplexity: 72.19, Loss: 4.28, Accuracy: 0.307

What is gpt2-small-arabic?

gpt2-small-arabic is a GPT2-based language model specifically trained on Arabic Wikipedia content. Developed by akhooli in 2020, this model represents an important step in Arabic natural language processing, offering capabilities for both text and poetry generation.

Implementation Details

The model was trained using the Fastai2 library on Kaggle's free GPU infrastructure. It uses the GPT2-small architecture as its foundation and was trained on approximately 900MB of Arabic Wikipedia data.

  • Built on GPT2-small architecture
  • Trained using Fastai2 framework
  • Utilizes Arabic Wikipedia corpus
  • Supports both general text and poetry generation

Core Capabilities

  • Arabic text generation
  • Poetry generation (via fine-tuned model)
  • Achieves 72.19 perplexity score
  • 30.7% accuracy on training data

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for Arabic language generation, trained on Wikipedia data, making it suitable for generating Arabic content. It's particularly notable for supporting both regular text and poetry generation through fine-tuning.

Q: What are the recommended use cases?

The model is recommended primarily for demonstration and proof-of-concept purposes. Due to limitations in training data quality (including lack of diacritics) and coverage, it's not recommended for production use. It's particularly useful for academic research and experimental Arabic text generation projects.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026