StableLM 2 12B

Property	Value
Parameter Count	12.1B
Architecture	Decoder-only Transformer
Context Length	4096 tokens
License	Stability AI Community License
Paper	Stable LM 2 Technical Report

What is stablelm-2-12b?

StableLM 2 12B is a state-of-the-art language model developed by Stability AI, featuring 12.1 billion parameters and trained on an impressive 2 trillion tokens of diverse multilingual and code datasets. The model supports 7 languages including English, German, Spanish, French, Italian, Dutch, and Portuguese, making it versatile for various applications.

Implementation Details

The model architecture consists of 40 layers with 32 attention heads (8 KV heads) and a hidden size of 5120. It implements advanced features like Rotary Position Embeddings, parallel attention and feed-forward residual layers, and uses the Arcade100k tokenizer.

Supports Flash Attention 2 for improved performance
Uses BF16 precision for optimal computation
Implements per-head QK normalization
Features bias-free feed-forward networks

Core Capabilities

Multi-lingual text generation across 7 languages
Code generation and processing
Long context understanding (4096 tokens)
Efficient inference with Flash Attention 2 support
Base model suitable for fine-tuning

Frequently Asked Questions

Q: What makes this model unique?

The model combines extensive training on 2 trillion tokens, support for multiple languages, and modern architecture features like Flash Attention 2 and Rotary Position Embeddings, making it highly capable for diverse applications.

Q: What are the recommended use cases?

The model is primarily intended as a base model for fine-tuning on specific applications. It's particularly suitable for multi-lingual applications, code generation, and general text generation tasks, though it requires evaluation and fine-tuning for safe deployment.

stablelm-2-12b