Zamba2-1.2B
Property | Value |
---|---|
Parameter Count | 1.22B |
Model Type | Hybrid SSM-Transformer |
License | Apache 2.0 |
Training Data | 3T tokens + 100B high-quality tokens |
Paper | Zamba Architecture Paper |
What is Zamba2-1.2B?
Zamba2-1.2B is a cutting-edge hybrid language model that combines state-space modeling (Mamba) with transformer architecture. It represents a significant advancement in AI model design, featuring a unique architecture that delivers high performance with minimal computational overhead. The model was trained on 3 trillion tokens of text and code data, followed by fine-tuning on 100B high-quality tokens.
Implementation Details
The model employs a sophisticated architecture combining Mamba2 blocks with shared transformer layers, enhanced by several key innovations:
- Integration of Mamba2 blocks in a hybrid architecture
- LoRA projectors for depth-specialized transformer layers
- Rotary position embeddings in shared attention layers
- Mistral v0.1 tokenizer implementation
Core Capabilities
- State-of-the-art performance among models under 2B parameters
- Extremely low inference latency and rapid generation
- Significantly smaller memory footprint compared to traditional transformers
- Efficient on-device deployment capabilities
Frequently Asked Questions
Q: What makes this model unique?
The model's hybrid architecture combining Mamba2 blocks with transformer layers, along with its innovative use of LoRA projectors and rotary position embeddings, enables exceptional performance while maintaining efficiency in both computation and memory usage.
Q: What are the recommended use cases?
Zamba2-1.2B is ideal for general-purpose text generation tasks, particularly in scenarios requiring on-device deployment or where computational resources are limited. However, it's important to note that the model is not fine-tuned for instruction following or chat applications.