Zamba2-1.2B

Property	Value
Parameter Count	1.22B
Model Type	Hybrid SSM-Transformer
License	Apache 2.0
Training Data	3T tokens + 100B high-quality tokens
Paper	Zamba Architecture Paper

What is Zamba2-1.2B?

Zamba2-1.2B is a cutting-edge hybrid language model that combines state-space modeling (Mamba) with transformer architecture. It represents a significant advancement in AI model design, featuring a unique architecture that delivers high performance with minimal computational overhead. The model was trained on 3 trillion tokens of text and code data, followed by fine-tuning on 100B high-quality tokens.

Implementation Details

The model employs a sophisticated architecture combining Mamba2 blocks with shared transformer layers, enhanced by several key innovations:

Integration of Mamba2 blocks in a hybrid architecture
LoRA projectors for depth-specialized transformer layers
Rotary position embeddings in shared attention layers
Mistral v0.1 tokenizer implementation

Core Capabilities

State-of-the-art performance among models under 2B parameters
Extremely low inference latency and rapid generation
Significantly smaller memory footprint compared to traditional transformers
Efficient on-device deployment capabilities

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid architecture combining Mamba2 blocks with transformer layers, along with its innovative use of LoRA projectors and rotary position embeddings, enables exceptional performance while maintaining efficiency in both computation and memory usage.

Q: What are the recommended use cases?

Zamba2-1.2B is ideal for general-purpose text generation tasks, particularly in scenarios requiring on-device deployment or where computational resources are limited. However, it's important to note that the model is not fine-tuned for instruction following or chat applications.

Zamba2-1.2B

Zamba2-1.2B

What is Zamba2-1.2B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models