AI21-Jamba-Mini-1.5
Property | Value |
---|---|
Parameters | 12B active / 52B total |
Context Length | 256K tokens |
Architecture | Hybrid SSM-Transformer (Jamba) |
License | Jamba Open Model License |
Knowledge Cutoff | March 5, 2024 |
What is AI21-Jamba-Mini-1.5?
AI21-Jamba-Mini-1.5 is a groundbreaking hybrid SSM-Transformer model that combines state-of-the-art performance with exceptional efficiency. As part of the Jamba 1.5 family, it delivers up to 2.5X faster inference than comparable models while maintaining high-quality outputs across various tasks.
Implementation Details
The model features a unique architecture that successfully integrates non-Transformer components at scale, supporting an impressive 256K context length. It can be deployed using various configurations, from full precision on multiple GPUs to quantized versions that can run on a single GPU.
- Supports multiple deployment options including vLLM and transformers library
- ExpertsInt8 quantization enables running on a single 80GB GPU
- Optimized for business use cases with function calling and structured output capabilities
Core Capabilities
- Multilingual support for 9 languages including English, Spanish, French, and Arabic
- Strong performance on benchmarks like MMLU (69.7%) and GSM-8K (75.8%)
- Tool use and grounded generation support
- JSON mode for structured output generation
- Fine-tuning support through LoRA and QLoRA
Frequently Asked Questions
Q: What makes this model unique?
The model combines traditional Transformer architecture with State Space Model (SSM) components, offering superior long-context handling and inference speed while maintaining high-quality outputs. It's specifically optimized for business applications and supports extensive context lengths up to 256K tokens.
Q: What are the recommended use cases?
The model excels in business applications requiring structured output, function calling, and long-context understanding. It's particularly well-suited for RAG applications, multilingual tasks, and scenarios requiring tool use or JSON output formatting.