AI21-Jamba-Mini-1.6
Property | Value |
---|---|
Parameter Count | 12B active / 52B total |
Architecture | Hybrid SSM-Transformer |
Context Length | 256K tokens |
License | Jamba Open Model License |
Knowledge Cutoff | March 5, 2024 |
Languages | English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, Hebrew |
What is AI21-Jamba-Mini-1.6?
AI21-Jamba-Mini-1.6 is a groundbreaking language model that combines Selective State Memory (SSM) and Transformer architectures to deliver superior performance in long-context tasks. With 12B active parameters and 52B total parameters, it represents a significant advancement in efficiency and capability, particularly excelling in enterprise applications like RAG workflows and document analysis.
Implementation Details
The model employs a sophisticated hybrid architecture that can be deployed using various methods, including vLLM for efficient inference and transformers library for direct implementation. It supports both full-precision and quantized operations, with ExpertsInt8 quantization enabling deployment on a single 80GB GPU.
- Requires minimum 2x 80GB GPUs for full operation in BF16 precision
- Supports up to 256K context length
- Includes optimized FlashAttention2 and Mamba kernels
- Compatible with major deployment frameworks like vLLM and transformers
Core Capabilities
- Exceptional performance on long-context tasks and benchmarks
- Advanced tool use capabilities with standardized API
- Support for fine-tuning through LoRA and QLoRA
- Multi-language support across 9 languages
- Superior benchmark performance compared to similar-sized models
Frequently Asked Questions
Q: What makes this model unique?
The model's hybrid SSM-Transformer architecture and efficient parameter usage allow it to outperform other open models on quality, speed, and long-context tasks, approaching the capabilities of leading closed models while remaining open for deployment.
Q: What are the recommended use cases?
The model excels in enterprise applications requiring long context processing, including RAG workflows, document analysis, and grounded question answering across lengthy documents. It's particularly suited for deployments requiring high-quality responses with extensive context understanding.