StableLM 2 12B
Property | Value |
---|---|
Parameter Count | 12.1B |
Architecture | Decoder-only Transformer |
Context Length | 4096 tokens |
License | Stability AI Community License |
Paper | Stable LM 2 Technical Report |
What is stablelm-2-12b?
StableLM 2 12B is a state-of-the-art language model developed by Stability AI, featuring 12.1 billion parameters and trained on an impressive 2 trillion tokens of diverse multilingual and code datasets. The model supports 7 languages including English, German, Spanish, French, Italian, Dutch, and Portuguese, making it versatile for various applications.
Implementation Details
The model architecture consists of 40 layers with 32 attention heads (8 KV heads) and a hidden size of 5120. It implements advanced features like Rotary Position Embeddings, parallel attention and feed-forward residual layers, and uses the Arcade100k tokenizer.
- Supports Flash Attention 2 for improved performance
- Uses BF16 precision for optimal computation
- Implements per-head QK normalization
- Features bias-free feed-forward networks
Core Capabilities
- Multi-lingual text generation across 7 languages
- Code generation and processing
- Long context understanding (4096 tokens)
- Efficient inference with Flash Attention 2 support
- Base model suitable for fine-tuning
Frequently Asked Questions
Q: What makes this model unique?
The model combines extensive training on 2 trillion tokens, support for multiple languages, and modern architecture features like Flash Attention 2 and Rotary Position Embeddings, making it highly capable for diverse applications.
Q: What are the recommended use cases?
The model is primarily intended as a base model for fine-tuning on specific applications. It's particularly suitable for multi-lingual applications, code generation, and general text generation tasks, though it requires evaluation and fine-tuning for safe deployment.