Jais-13b
Property | Value |
---|---|
Parameter Count | 13 Billion |
Architecture | Transformer-based decoder-only (GPT-3) |
License | Apache 2.0 |
Training Data | 72B Arabic tokens, 279B English/code tokens |
Paper | Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models |
What is jais-13b?
Jais-13b is a state-of-the-art bilingual large language model developed through a collaboration between Inception, MBZUAI, and Cerebras Systems. It represents a significant advancement in Arabic-English language processing, trained on an extensive dataset with 72 billion Arabic tokens and 279 billion English/code tokens. The model employs sophisticated architectural elements including SwiGLU non-linearity and ALiBi position embeddings, enabling superior handling of long sequences.
Implementation Details
The model leverages a transformer-based decoder-only architecture similar to GPT-3, trained using the Condor Galaxy 1 (CG-1) supercomputer platform. Training utilized the AdamW optimizer with carefully tuned learning rates and a substantial batch size of 1920. The implementation includes special considerations for bilingual processing, with Arabic data being iterated 1.6 times compared to English data's single iteration.
- Implements ALiBi position embeddings for improved sequence length handling
- Uses SwiGLU non-linearity for enhanced model performance
- Trained with fp32 precision and adaptive learning rates
- Incorporates sophisticated tokenization for both Arabic and English
Core Capabilities
- Bilingual text generation in Arabic and English
- State-of-the-art performance on Arabic language tasks
- Strong reasoning and knowledge capabilities
- Suitable for research and commercial applications
- Effective for chat assistants and customer service
Frequently Asked Questions
Q: What makes this model unique?
Jais-13b stands out for its exceptional bilingual capabilities, particularly in Arabic language processing, where it achieves state-of-the-art performance across comprehensive test suites. The model's unique training approach, with increased iteration on Arabic content, ensures superior performance in Arabic language tasks while maintaining strong English capabilities.
Q: What are the recommended use cases?
The model is well-suited for various applications including academic research in Arabic NLP, business applications targeting Arabic-speaking audiences, and development of Arabic language capabilities in applications. It's particularly effective for chat assistants, customer service, and general language processing tasks in both Arabic and English.