Zamba2-2.7B-instruct

Zyphra

A 2.7B parameter hybrid SSM-transformer model excelling in instruction-following tasks, outperforming larger models with faster inference and lower latency.

Property	Value
Parameter Count	2.7B
License	Apache 2.0
Model Type	Hybrid SSM-Transformer
Tensor Type	F32/BF16

What is Zamba2-2.7B-instruct?

Zamba2-2.7B-instruct is a groundbreaking hybrid model that combines state-space modeling (Mamba2) with transformer architecture. Fine-tuned on multiple instruction-following and chat datasets, it demonstrates exceptional performance that surpasses many larger models, including Mistral-7B-Instruct and Gemma2-2B-Instruct.

Implementation Details

The model architecture features a unique backbone of Mamba2 layers interleaved with shared attention layers. It implements LoRA projection matrices for the shared MLP, enabling position-specific specialization while maintaining minimal parameter overhead. The model has been fine-tuned through a two-step process: initial SFT training on ultrachat_200k and Infinity-Instruct, followed by DPO training on multiple preference datasets.

Innovative hybrid architecture combining SSM and transformer blocks
Efficient parameter sharing through shared attention mechanisms
Enhanced information flow through embedding concatenation
Optimized for both performance and efficiency

Core Capabilities

Superior instruction-following abilities (MT-Bench score: 72.40)
Extremely low inference latency
Reduced memory footprint compared to traditional transformers
Excellent performance in reasoning tasks
Efficient on-device deployment capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid architecture combining Mamba2 state-space modeling with transformer blocks enables exceptional performance while maintaining low computational requirements. It achieves better results than many larger models while using fewer parameters.

Q: What are the recommended use cases?

The model is particularly well-suited for on-device applications requiring strong instruction-following capabilities, rapid response times, and efficient resource usage. It excels in general-purpose text generation, reasoning tasks, and conversational applications.