StableLM-3B-4E1T

Property	Value
Parameter Count	2.8B parameters
Model Type	Decoder-only Transformer
License	CC BY-SA-4.0
Training Data	1 trillion tokens
Architecture	32 layers, 32 heads, 2560 hidden size

What is stablelm-3b-4e1t?

StableLM-3B-4E1T is a state-of-the-art language model developed by Stability AI, designed as a foundational base model for various NLP tasks. Trained on 1 trillion tokens across multiple high-quality datasets including Falcon RefinedWeb, RedPajama-Data, and StarCoder, this model represents a significant advancement in compact yet powerful language models.

Implementation Details

The model follows a decoder-only transformer architecture similar to LLaMA, but with specific optimizations for performance and efficiency. It implements Rotary Position Embeddings applied to 25% of head embedding dimensions and uses LayerNorm with learned bias terms.

4096 sequence length capability
Trained using Flash Attention 2 for optimal performance
BF16 precision training
Optimized with AdamW

Core Capabilities

Strong performance on HellaSwag (75.94% accuracy)
Solid reasoning capabilities with 46.59% on AI2 Reasoning Challenge
Effective on Winogrande tasks (71.19% accuracy)
MMLU performance of 45.23%

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture and extensive training regimen of 4 epochs on 1 trillion tokens, making it particularly suitable as a base model for fine-tuning. Its combination of size and performance makes it an excellent choice for various NLP applications.

Q: What are the recommended use cases?

The model is primarily intended as a foundation for fine-tuning on specific tasks. It's particularly well-suited for text generation tasks, language understanding, and can be adapted for various downstream applications through proper fine-tuning.

stablelm-3b-4e1t