AI21-Jamba-Mini-1.5

Property	Value
Parameters	12B active / 52B total
Context Length	256K tokens
Architecture	Hybrid SSM-Transformer (Jamba)
License	Jamba Open Model License
Knowledge Cutoff	March 5, 2024

What is AI21-Jamba-Mini-1.5?

AI21-Jamba-Mini-1.5 is a groundbreaking hybrid SSM-Transformer model that combines state-of-the-art performance with exceptional efficiency. As part of the Jamba 1.5 family, it delivers up to 2.5X faster inference than comparable models while maintaining high-quality outputs across various tasks.

Implementation Details

The model features a unique architecture that successfully integrates non-Transformer components at scale, supporting an impressive 256K context length. It can be deployed using various configurations, from full precision on multiple GPUs to quantized versions that can run on a single GPU.

Supports multiple deployment options including vLLM and transformers library
ExpertsInt8 quantization enables running on a single 80GB GPU
Optimized for business use cases with function calling and structured output capabilities

Core Capabilities

Multilingual support for 9 languages including English, Spanish, French, and Arabic
Strong performance on benchmarks like MMLU (69.7%) and GSM-8K (75.8%)
Tool use and grounded generation support
JSON mode for structured output generation
Fine-tuning support through LoRA and QLoRA

Frequently Asked Questions

Q: What makes this model unique?

The model combines traditional Transformer architecture with State Space Model (SSM) components, offering superior long-context handling and inference speed while maintaining high-quality outputs. It's specifically optimized for business applications and supports extensive context lengths up to 256K tokens.

Q: What are the recommended use cases?

The model excels in business applications requiring structured output, function calling, and long-context understanding. It's particularly well-suited for RAG applications, multilingual tasks, and scenarios requiring tool use or JSON output formatting.