gpt2-small-arabic
Property | Value |
---|---|
Author | akhooli |
Training Data | Arabic Wikipedia (900MB) |
Framework | Fastai2 |
Model Base | GPT2-small |
Performance Metrics | Perplexity: 72.19, Loss: 4.28, Accuracy: 0.307 |
What is gpt2-small-arabic?
gpt2-small-arabic is a GPT2-based language model specifically trained on Arabic Wikipedia content. Developed by akhooli in 2020, this model represents an important step in Arabic natural language processing, offering capabilities for both text and poetry generation.
Implementation Details
The model was trained using the Fastai2 library on Kaggle's free GPU infrastructure. It uses the GPT2-small architecture as its foundation and was trained on approximately 900MB of Arabic Wikipedia data.
- Built on GPT2-small architecture
- Trained using Fastai2 framework
- Utilizes Arabic Wikipedia corpus
- Supports both general text and poetry generation
Core Capabilities
- Arabic text generation
- Poetry generation (via fine-tuned model)
- Achieves 72.19 perplexity score
- 30.7% accuracy on training data
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically designed for Arabic language generation, trained on Wikipedia data, making it suitable for generating Arabic content. It's particularly notable for supporting both regular text and poetry generation through fine-tuning.
Q: What are the recommended use cases?
The model is recommended primarily for demonstration and proof-of-concept purposes. Due to limitations in training data quality (including lack of diacritics) and coverage, it's not recommended for production use. It's particularly useful for academic research and experimental Arabic text generation projects.