Hymba-1.5B-Base

Maintained By
nvidia

Hymba-1.5B-Base

PropertyValue
Parameter Count1.52B
Model TypeText Generation
ArchitectureHybrid Mamba-Attention
LicenseNVIDIA Open Model License
PaperarXiv:2411.13676

What is Hymba-1.5B-Base?

Hymba-1.5B-Base is an innovative language model developed by NVIDIA that introduces a hybrid architecture combining Mamba and Attention heads running in parallel. The model represents a significant advancement in efficient language model design, featuring 1.52B parameters and achieving performance that surpasses other sub-2B parameter models.

Implementation Details

The model architecture consists of 32 layers with an embedding size of 1600 and 25 attention heads. It employs a unique hybrid design where each layer combines standard attention heads with Mamba heads for parallel processing. The architecture includes 16 SSM states, 3 full attention layers, and predominantly uses sliding window attention for the remaining layers.

  • Embedding dimension: 1600
  • MLP intermediate dimension: 5504
  • Utilizes Grouped-Query Attention (GQA)
  • Implements Rotary Position Embeddings (RoPE)
  • Features meta tokens for improved efficacy
  • Shares KV cache between layers and heads

Core Capabilities

  • Superior performance compared to sub-2B parameter models
  • Efficient text generation with parallel processing
  • Commercial-ready deployment
  • Flexible adaptation for various NLP tasks
  • Memory-efficient operation through KV cache sharing

Frequently Asked Questions

Q: What makes this model unique?

The model's hybrid architecture combining Mamba and Attention heads, along with meta tokens and cross-layer KV sharing, creates a uniquely efficient and powerful language model. This design allows for parallel processing of inputs while maintaining high performance with a relatively small parameter count.

Q: What are the recommended use cases?

Hymba-1.5B-Base is suitable for various text generation tasks and can be adapted for commercial applications. However, it's important to note that generation currently only supports batch size 1 due to implementation constraints with meta tokens and sliding window attention.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.