Instella-3B
Property | Value |
---|---|
Parameter Count | 3.11B |
Context Length | 4096 tokens |
Training Tokens | 4.15 Trillion |
License | ResearchRAIL |
Model Type | Decoder-only Transformer |
Architecture | 36 layers, 32 attention heads |
What is Instella-3B?
Instella-3B is AMD's groundbreaking open-source language model, trained from scratch on AMD Instinct MI300X GPUs. This 3-billion parameter model represents a significant advancement in open AI development, achieving performance that outperforms existing fully open models and competing with state-of-the-art closed models like Llama-3.2-3B and Gemma-2-2B.
Implementation Details
The model employs a sophisticated multi-stage training approach, incorporating FlashAttention-2, Torch Compile, and Fully Sharded Data Parallelism (FSDP) with hybrid sharding. Training was conducted across 128 MI300X GPUs distributed over 16 nodes, using bfloat16 mixed-precision training for optimal resource utilization.
- Advanced architecture with 36 decoder layers and 32 attention heads
- 4,096 token context length with ~50,000 vocabulary size
- Trained on 4.15 trillion tokens across multiple stages
- Implements OLMo tokenizer for efficient processing
Core Capabilities
- Outperforms existing open models by 8.08% on average across benchmarks
- Exceptional performance in mathematical reasoning (GSM8K: +48.98% improvement)
- Strong instruction-following capabilities through SFT and DPO training
- Competitive performance with closed-source models in various tasks
Frequently Asked Questions
Q: What makes this model unique?
Instella-3B stands out for being fully open-source while achieving competitive performance with closed-source models. It demonstrates AMD's hardware capabilities in AI training and represents a significant advancement in accessible AI research.
Q: What are the recommended use cases?
The model excels in instruction following, mathematical reasoning, and general language understanding tasks. However, it's designed for research purposes only and should not be used in safety-critical situations or medical applications.