jais-13b

Maintained By
inceptionai

Jais-13b

PropertyValue
Parameter Count13 Billion
ArchitectureTransformer-based decoder-only (GPT-3)
LicenseApache 2.0
Training Data72B Arabic tokens, 279B English/code tokens
PaperJais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

What is jais-13b?

Jais-13b is a state-of-the-art bilingual large language model developed through a collaboration between Inception, MBZUAI, and Cerebras Systems. It represents a significant advancement in Arabic-English language processing, trained on an extensive dataset with 72 billion Arabic tokens and 279 billion English/code tokens. The model employs sophisticated architectural elements including SwiGLU non-linearity and ALiBi position embeddings, enabling superior handling of long sequences.

Implementation Details

The model leverages a transformer-based decoder-only architecture similar to GPT-3, trained using the Condor Galaxy 1 (CG-1) supercomputer platform. Training utilized the AdamW optimizer with carefully tuned learning rates and a substantial batch size of 1920. The implementation includes special considerations for bilingual processing, with Arabic data being iterated 1.6 times compared to English data's single iteration.

  • Implements ALiBi position embeddings for improved sequence length handling
  • Uses SwiGLU non-linearity for enhanced model performance
  • Trained with fp32 precision and adaptive learning rates
  • Incorporates sophisticated tokenization for both Arabic and English

Core Capabilities

  • Bilingual text generation in Arabic and English
  • State-of-the-art performance on Arabic language tasks
  • Strong reasoning and knowledge capabilities
  • Suitable for research and commercial applications
  • Effective for chat assistants and customer service

Frequently Asked Questions

Q: What makes this model unique?

Jais-13b stands out for its exceptional bilingual capabilities, particularly in Arabic language processing, where it achieves state-of-the-art performance across comprehensive test suites. The model's unique training approach, with increased iteration on Arabic content, ensures superior performance in Arabic language tasks while maintaining strong English capabilities.

Q: What are the recommended use cases?

The model is well-suited for various applications including academic research in Arabic NLP, business applications targeting Arabic-speaking audiences, and development of Arabic language capabilities in applications. It's particularly effective for chat assistants, customer service, and general language processing tasks in both Arabic and English.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.