Instella-3B-SFT
Property | Value |
---|---|
Model Type | Large Language Model (SFT) |
Parameters | 3 billion |
Context Length | 4,096 tokens |
Training Tokens | 8.902 billion (x3 epochs) |
License | ResearchRAIL |
Developer | AMD |
What is Instella-3B-SFT?
Instella-3B-SFT is a supervised fine-tuned language model developed by AMD as part of their Instella family of models. It represents the third stage in AMD's multi-stage training pipeline, built specifically to enhance instruction-following capabilities. The model was trained on AMD's Instinct MI300X GPUs, demonstrating AMD's commitment to advancing open-source AI development.
Implementation Details
The model features a sophisticated architecture with 36 decoder layers and 32 attention heads, with a model hidden size of 2560 and MLP hidden size of 13824. It implements advanced training techniques including FlashAttention-2, Torch Compile, and bfloat16 mixed-precision training for optimal performance and resource utilization.
- Fully sharded data parallelism (FSDP) with hybrid sharding
- OLMo tokenizer with ~50,000 tokens vocabulary
- Trained using AMD ROCm software stack
- Supports sequence lengths up to 4,096 tokens
Core Capabilities
- Strong performance in knowledge recall and reasoning tasks
- Enhanced instruction-following abilities through supervised fine-tuning
- Competitive performance against leading open-weight models
- Robust mathematical and logical reasoning capabilities
- Effective multi-turn conversation handling
Frequently Asked Questions
Q: What makes this model unique?
Instella-3B-SFT stands out for being a fully open-source model that achieves competitive performance against larger closed-source models. It demonstrates strong capabilities across various benchmarks while being trained on significantly fewer tokens than its competitors.
Q: What are the recommended use cases?
The model is best suited for research purposes and tasks requiring instruction following, knowledge recall, and reasoning capabilities. However, it's not recommended for safety-critical situations, health applications, or cases requiring high levels of factuality.