AI21-Jamba-Mini-1.6

Property	Value
Parameter Count	12B active / 52B total
Architecture	Hybrid SSM-Transformer
Context Length	256K tokens
License	Jamba Open Model License
Knowledge Cutoff	March 5, 2024
Languages	English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, Hebrew

What is AI21-Jamba-Mini-1.6?

AI21-Jamba-Mini-1.6 is a groundbreaking language model that combines Selective State Memory (SSM) and Transformer architectures to deliver superior performance in long-context tasks. With 12B active parameters and 52B total parameters, it represents a significant advancement in efficiency and capability, particularly excelling in enterprise applications like RAG workflows and document analysis.

Implementation Details

The model employs a sophisticated hybrid architecture that can be deployed using various methods, including vLLM for efficient inference and transformers library for direct implementation. It supports both full-precision and quantized operations, with ExpertsInt8 quantization enabling deployment on a single 80GB GPU.

Requires minimum 2x 80GB GPUs for full operation in BF16 precision
Supports up to 256K context length
Includes optimized FlashAttention2 and Mamba kernels
Compatible with major deployment frameworks like vLLM and transformers

Core Capabilities

Exceptional performance on long-context tasks and benchmarks
Advanced tool use capabilities with standardized API
Support for fine-tuning through LoRA and QLoRA
Multi-language support across 9 languages
Superior benchmark performance compared to similar-sized models

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid SSM-Transformer architecture and efficient parameter usage allow it to outperform other open models on quality, speed, and long-context tasks, approaching the capabilities of leading closed models while remaining open for deployment.

Q: What are the recommended use cases?

The model excels in enterprise applications requiring long context processing, including RAG workflows, document analysis, and grounded question answering across lengthy documents. It's particularly suited for deployments requiring high-quality responses with extensive context understanding.