Instella-3B-SFT

Property	Value
Model Type	Large Language Model (SFT)
Parameters	3 billion
Context Length	4,096 tokens
Training Tokens	8.902 billion (x3 epochs)
License	ResearchRAIL
Developer	AMD

What is Instella-3B-SFT?

Instella-3B-SFT is a supervised fine-tuned language model developed by AMD as part of their Instella family of models. It represents the third stage in AMD's multi-stage training pipeline, built specifically to enhance instruction-following capabilities. The model was trained on AMD's Instinct MI300X GPUs, demonstrating AMD's commitment to advancing open-source AI development.

Implementation Details

The model features a sophisticated architecture with 36 decoder layers and 32 attention heads, with a model hidden size of 2560 and MLP hidden size of 13824. It implements advanced training techniques including FlashAttention-2, Torch Compile, and bfloat16 mixed-precision training for optimal performance and resource utilization.

Fully sharded data parallelism (FSDP) with hybrid sharding
OLMo tokenizer with ~50,000 tokens vocabulary
Trained using AMD ROCm software stack
Supports sequence lengths up to 4,096 tokens

Core Capabilities

Strong performance in knowledge recall and reasoning tasks
Enhanced instruction-following abilities through supervised fine-tuning
Competitive performance against leading open-weight models
Robust mathematical and logical reasoning capabilities
Effective multi-turn conversation handling

Frequently Asked Questions

Q: What makes this model unique?

Instella-3B-SFT stands out for being a fully open-source model that achieves competitive performance against larger closed-source models. It demonstrates strong capabilities across various benchmarks while being trained on significantly fewer tokens than its competitors.

Q: What are the recommended use cases?

The model is best suited for research purposes and tasks requiring instruction following, knowledge recall, and reasoning capabilities. However, it's not recommended for safety-critical situations, health applications, or cases requiring high levels of factuality.

Instella-3B-SFT

Instella-3B-SFT

What is Instella-3B-SFT?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models