arabic-t5-small

Property	Value
Author	flax-community
Training Time	22h 23m 51s
Evaluation Accuracy	56.84%
Vocabulary Size	64,000
Model URL	HuggingFace

What is arabic-t5-small?

arabic-t5-small is a T5v1.1 small model specifically trained for Arabic language processing. The model was trained on a comprehensive dataset combining the Arabic Billion Words corpus and Arabic subsets from mC4 and Oscar datasets. Due to time constraints, the training covered approximately 10% of the complete dataset, equivalent to 22,000 steps or 4.3 billion tokens.

Implementation Details

The model employs a unique approach to Arabic text processing by preserving diacritics in the vocabulary, contrary to other Arabic language models. Training was conducted with a batch size of 384, using a learning rate of 1e-2 and jnp.float32 dtype. The preprocessing was intentionally minimal, only replacing URLs, emails, and social media mentions with fixed tokens.

Training batch size: 384
Evaluation batch size: 768
Learning rate: 1e-2
Tokenizer trained on 5% of training set
Vocabulary size: 64,000 tokens

Core Capabilities

Arabic text generation and processing
Preserves Arabic diacritics for enhanced linguistic accuracy
Suitable for fine-tuning on specific tasks
Achieves 56.84% evaluation accuracy

Frequently Asked Questions

Q: What makes this model unique?

This model stands out by maintaining Arabic diacritics in its vocabulary, unlike most other Arabic language models. It also uses a minimalistic preprocessing approach, focusing on preserving the natural structure of Arabic text while only handling technical elements like URLs and social media mentions.

Q: What are the recommended use cases?

The model is particularly suitable for Arabic text processing tasks requiring diacritic sensitivity. For fine-tuning, it's recommended to enable dropout (recommended rate: 0.1) as the pre-training was done with dropout turned off.

arabic-t5-small

arabic-t5-small

What is arabic-t5-small?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models