AI21-Jamba-Mini-1.6

Maintained By
ai21labs

AI21-Jamba-Mini-1.6

PropertyValue
Parameter Count12B active / 52B total
ArchitectureHybrid SSM-Transformer
Context Length256K tokens
LicenseJamba Open Model License
Knowledge CutoffMarch 5, 2024
LanguagesEnglish, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, Hebrew

What is AI21-Jamba-Mini-1.6?

AI21-Jamba-Mini-1.6 is a groundbreaking language model that combines Selective State Memory (SSM) and Transformer architectures to deliver superior performance in long-context tasks. With 12B active parameters and 52B total parameters, it represents a significant advancement in efficiency and capability, particularly excelling in enterprise applications like RAG workflows and document analysis.

Implementation Details

The model employs a sophisticated hybrid architecture that can be deployed using various methods, including vLLM for efficient inference and transformers library for direct implementation. It supports both full-precision and quantized operations, with ExpertsInt8 quantization enabling deployment on a single 80GB GPU.

  • Requires minimum 2x 80GB GPUs for full operation in BF16 precision
  • Supports up to 256K context length
  • Includes optimized FlashAttention2 and Mamba kernels
  • Compatible with major deployment frameworks like vLLM and transformers

Core Capabilities

  • Exceptional performance on long-context tasks and benchmarks
  • Advanced tool use capabilities with standardized API
  • Support for fine-tuning through LoRA and QLoRA
  • Multi-language support across 9 languages
  • Superior benchmark performance compared to similar-sized models

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid SSM-Transformer architecture and efficient parameter usage allow it to outperform other open models on quality, speed, and long-context tasks, approaching the capabilities of leading closed models while remaining open for deployment.

Q: What are the recommended use cases?

The model excels in enterprise applications requiring long context processing, including RAG workflows, document analysis, and grounded question answering across lengthy documents. It's particularly suited for deployments requiring high-quality responses with extensive context understanding.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.