Jais-13b

Property	Value
Parameter Count	13 Billion
Architecture	Transformer-based decoder-only (GPT-3)
License	Apache 2.0
Training Data	72B Arabic tokens, 279B English/code tokens
Paper	Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

What is jais-13b?

Jais-13b is a state-of-the-art bilingual large language model developed through a collaboration between Inception, MBZUAI, and Cerebras Systems. It represents a significant advancement in Arabic-English language processing, trained on an extensive dataset with 72 billion Arabic tokens and 279 billion English/code tokens. The model employs sophisticated architectural elements including SwiGLU non-linearity and ALiBi position embeddings, enabling superior handling of long sequences.

Implementation Details

The model leverages a transformer-based decoder-only architecture similar to GPT-3, trained using the Condor Galaxy 1 (CG-1) supercomputer platform. Training utilized the AdamW optimizer with carefully tuned learning rates and a substantial batch size of 1920. The implementation includes special considerations for bilingual processing, with Arabic data being iterated 1.6 times compared to English data's single iteration.

Implements ALiBi position embeddings for improved sequence length handling
Uses SwiGLU non-linearity for enhanced model performance
Trained with fp32 precision and adaptive learning rates
Incorporates sophisticated tokenization for both Arabic and English

Core Capabilities

Bilingual text generation in Arabic and English
State-of-the-art performance on Arabic language tasks
Strong reasoning and knowledge capabilities
Suitable for research and commercial applications
Effective for chat assistants and customer service

Frequently Asked Questions

Q: What makes this model unique?

Jais-13b stands out for its exceptional bilingual capabilities, particularly in Arabic language processing, where it achieves state-of-the-art performance across comprehensive test suites. The model's unique training approach, with increased iteration on Arabic content, ensures superior performance in Arabic language tasks while maintaining strong English capabilities.

Q: What are the recommended use cases?

The model is well-suited for various applications including academic research in Arabic NLP, business applications targeting Arabic-speaking audiences, and development of Arabic language capabilities in applications. It's particularly effective for chat assistants, customer service, and general language processing tasks in both Arabic and English.

jais-13b