Instella-3B-Instruct

Property	Value
Parameter Count	3.11B
Context Length	4096 tokens
Architecture	36 layers, 32 attention heads
License	ResearchRAIL
Training Tokens	4.15 Trillion

What is Instella-3B-Instruct?

Instella-3B-Instruct is AMD's latest instruction-tuned language model, developed as part of their commitment to open-source AI research. Trained on AMD Instinct MI300X GPUs, this model represents a significant advancement in fully open language models, achieving performance that rivals closed-source competitors while maintaining complete transparency in its development process.

Implementation Details

The model utilizes a transformer-based architecture with 36 decoder layers and 32 attention heads. It implements advanced training techniques including FlashAttention-2, Torch Compile, and Fully Sharded Data Parallelism (FSDP) with hybrid sharding. The training process involved multiple stages: pre-training (4.065T tokens), second-stage pre-training (57.575B tokens), supervised fine-tuning, and direct preference optimization (DPO).

Vocabulary size of ~50,000 tokens using OLMo tokenizer
4,096 token context length
bfloat16 mixed-precision training
Trained across 128 Instinct MI300X GPUs

Core Capabilities

Outperforms existing fully open models by 14.37% on average
Competitive performance with Llama-3.2-3B and Qwen2.5-3B
Strong performance in instruction following and multi-turn QA tasks
Enhanced capabilities in mathematical reasoning and knowledge recall

Frequently Asked Questions

Q: What makes this model unique?

Instella-3B-Instruct stands out for being fully open-source while achieving performance comparable to closed-source models. It's trained using AMD's MI300X GPUs and implements state-of-the-art training techniques, making it a significant milestone in open AI development.

Q: What are the recommended use cases?

The model excels in instruction following, multi-turn QA tasks, and mathematical reasoning. However, it's intended for research purposes only and should not be used in safety-critical situations, health applications, or scenarios requiring high factual accuracy.