StableLM 2 1.6B
Property | Value |
---|---|
Parameter Count | 1.64B parameters |
Architecture | Decoder-only Transformer |
Training Data | 2 trillion tokens |
License | Stability AI Community License |
Supported Languages | English, German, Spanish, French, Italian, Dutch, Portuguese |
Paper | Stable LM 2 1.6B Technical Report |
What is stablelm-2-1_6b?
StableLM 2 1.6B is a state-of-the-art decoder-only language model developed by Stability AI. It represents a significant advancement in multilingual language modeling, trained on a diverse dataset of 2 trillion tokens across multiple languages and code. The model features a sophisticated architecture with 24 layers, 32 attention heads, and a hidden size of 2048.
Implementation Details
The model utilizes advanced architectural elements including Rotary Position Embeddings, LayerNorm with learned bias terms, and selective bias implementation in the attention layers. It's optimized for performance with Flash Attention 2 support and employs the Arcade100k tokenizer with a vocabulary size of 100,352.
- 2048 hidden size with 24 layers and 32 attention heads
- 4096 sequence length capability
- Trained on multiple high-quality datasets including Falcon RefinedWeb and RedPajama-Data-1T
- Implements Flash Attention 2 for optimal performance
Core Capabilities
- Multilingual text generation across 7 languages
- Code generation and processing
- Efficient text completion with customizable parameters
- Optimized for fine-tuning on downstream tasks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture combining multilingual capabilities with a relatively compact parameter count. It features optimized attention mechanisms and supports Flash Attention 2, making it both powerful and practical for various applications.
Q: What are the recommended use cases?
The model is primarily intended as a base model for fine-tuning in specific applications. It's particularly suitable for multilingual text generation, code-related tasks, and can be adapted for various downstream applications after appropriate fine-tuning and safety evaluations.