Hymba-1.5B-Base

Property	Value
Parameter Count	1.52B
Model Type	Text Generation
Architecture	Hybrid Mamba-Attention
License	NVIDIA Open Model License
Paper	arXiv:2411.13676

What is Hymba-1.5B-Base?

Hymba-1.5B-Base is an innovative language model developed by NVIDIA that introduces a hybrid architecture combining Mamba and Attention heads running in parallel. The model represents a significant advancement in efficient language model design, featuring 1.52B parameters and achieving performance that surpasses other sub-2B parameter models.

Implementation Details

The model architecture consists of 32 layers with an embedding size of 1600 and 25 attention heads. It employs a unique hybrid design where each layer combines standard attention heads with Mamba heads for parallel processing. The architecture includes 16 SSM states, 3 full attention layers, and predominantly uses sliding window attention for the remaining layers.

Embedding dimension: 1600
MLP intermediate dimension: 5504
Utilizes Grouped-Query Attention (GQA)
Implements Rotary Position Embeddings (RoPE)
Features meta tokens for improved efficacy
Shares KV cache between layers and heads

Core Capabilities

Superior performance compared to sub-2B parameter models
Efficient text generation with parallel processing
Commercial-ready deployment
Flexible adaptation for various NLP tasks
Memory-efficient operation through KV cache sharing

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid architecture combining Mamba and Attention heads, along with meta tokens and cross-layer KV sharing, creates a uniquely efficient and powerful language model. This design allows for parallel processing of inputs while maintaining high performance with a relatively small parameter count.

Q: What are the recommended use cases?

Hymba-1.5B-Base is suitable for various text generation tasks and can be adapted for commercial applications. However, it's important to note that generation currently only supports batch size 1 due to implementation constraints with meta tokens and sliding window attention.

Hymba-1.5B-Base

Hymba-1.5B-Base

What is Hymba-1.5B-Base?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models