StableLM-3B-4E1T
Property | Value |
---|---|
Parameter Count | 2.8B parameters |
Model Type | Decoder-only Transformer |
License | CC BY-SA-4.0 |
Training Data | 1 trillion tokens |
Architecture | 32 layers, 32 heads, 2560 hidden size |
What is stablelm-3b-4e1t?
StableLM-3B-4E1T is a state-of-the-art language model developed by Stability AI, designed as a foundational base model for various NLP tasks. Trained on 1 trillion tokens across multiple high-quality datasets including Falcon RefinedWeb, RedPajama-Data, and StarCoder, this model represents a significant advancement in compact yet powerful language models.
Implementation Details
The model follows a decoder-only transformer architecture similar to LLaMA, but with specific optimizations for performance and efficiency. It implements Rotary Position Embeddings applied to 25% of head embedding dimensions and uses LayerNorm with learned bias terms.
- 4096 sequence length capability
- Trained using Flash Attention 2 for optimal performance
- BF16 precision training
- Optimized with AdamW
Core Capabilities
- Strong performance on HellaSwag (75.94% accuracy)
- Solid reasoning capabilities with 46.59% on AI2 Reasoning Challenge
- Effective on Winogrande tasks (71.19% accuracy)
- MMLU performance of 45.23%
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient architecture and extensive training regimen of 4 epochs on 1 trillion tokens, making it particularly suitable as a base model for fine-tuning. Its combination of size and performance makes it an excellent choice for various NLP applications.
Q: What are the recommended use cases?
The model is primarily intended as a foundation for fine-tuning on specific tasks. It's particularly well-suited for text generation tasks, language understanding, and can be adapted for various downstream applications through proper fine-tuning.