stablelm-3b-4e1t

Maintained By
stabilityai

StableLM-3B-4E1T

PropertyValue
Parameter Count2.8B parameters
Model TypeDecoder-only Transformer
LicenseCC BY-SA-4.0
Training Data1 trillion tokens
Architecture32 layers, 32 heads, 2560 hidden size

What is stablelm-3b-4e1t?

StableLM-3B-4E1T is a state-of-the-art language model developed by Stability AI, designed as a foundational base model for various NLP tasks. Trained on 1 trillion tokens across multiple high-quality datasets including Falcon RefinedWeb, RedPajama-Data, and StarCoder, this model represents a significant advancement in compact yet powerful language models.

Implementation Details

The model follows a decoder-only transformer architecture similar to LLaMA, but with specific optimizations for performance and efficiency. It implements Rotary Position Embeddings applied to 25% of head embedding dimensions and uses LayerNorm with learned bias terms.

  • 4096 sequence length capability
  • Trained using Flash Attention 2 for optimal performance
  • BF16 precision training
  • Optimized with AdamW

Core Capabilities

  • Strong performance on HellaSwag (75.94% accuracy)
  • Solid reasoning capabilities with 46.59% on AI2 Reasoning Challenge
  • Effective on Winogrande tasks (71.19% accuracy)
  • MMLU performance of 45.23%

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient architecture and extensive training regimen of 4 epochs on 1 trillion tokens, making it particularly suitable as a base model for fine-tuning. Its combination of size and performance makes it an excellent choice for various NLP applications.

Q: What are the recommended use cases?

The model is primarily intended as a foundation for fine-tuning on specific tasks. It's particularly well-suited for text generation tasks, language understanding, and can be adapted for various downstream applications through proper fine-tuning.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.