Instella-3B-SFT

Instella-3B-SFT

amd

AMD's 3B parameter instruction-tuned LLM, trained on 8.9B tokens. Features 36 decoder layers, 32 attention heads, and 4K context length. Strong performance on reasoning tasks.

PropertyValue
Model TypeLarge Language Model (SFT)
Parameters3 billion
Context Length4,096 tokens
Training Tokens8.902 billion (x3 epochs)
LicenseResearchRAIL
DeveloperAMD

What is Instella-3B-SFT?

Instella-3B-SFT is a supervised fine-tuned language model developed by AMD as part of their Instella family of models. It represents the third stage in AMD's multi-stage training pipeline, built specifically to enhance instruction-following capabilities. The model was trained on AMD's Instinct MI300X GPUs, demonstrating AMD's commitment to advancing open-source AI development.

Implementation Details

The model features a sophisticated architecture with 36 decoder layers and 32 attention heads, with a model hidden size of 2560 and MLP hidden size of 13824. It implements advanced training techniques including FlashAttention-2, Torch Compile, and bfloat16 mixed-precision training for optimal performance and resource utilization.

  • Fully sharded data parallelism (FSDP) with hybrid sharding
  • OLMo tokenizer with ~50,000 tokens vocabulary
  • Trained using AMD ROCm software stack
  • Supports sequence lengths up to 4,096 tokens

Core Capabilities

  • Strong performance in knowledge recall and reasoning tasks
  • Enhanced instruction-following abilities through supervised fine-tuning
  • Competitive performance against leading open-weight models
  • Robust mathematical and logical reasoning capabilities
  • Effective multi-turn conversation handling

Frequently Asked Questions

Q: What makes this model unique?

Instella-3B-SFT stands out for being a fully open-source model that achieves competitive performance against larger closed-source models. It demonstrates strong capabilities across various benchmarks while being trained on significantly fewer tokens than its competitors.

Q: What are the recommended use cases?

The model is best suited for research purposes and tasks requiring instruction following, knowledge recall, and reasoning capabilities. However, it's not recommended for safety-critical situations, health applications, or cases requiring high levels of factuality.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026