Instella-3B

Property	Value
Parameter Count	3.11B
Context Length	4096 tokens
Training Tokens	4.15 Trillion
License	ResearchRAIL
Model Type	Decoder-only Transformer
Architecture	36 layers, 32 attention heads

What is Instella-3B?

Instella-3B is AMD's groundbreaking open-source language model, trained from scratch on AMD Instinct MI300X GPUs. This 3-billion parameter model represents a significant advancement in open AI development, achieving performance that outperforms existing fully open models and competing with state-of-the-art closed models like Llama-3.2-3B and Gemma-2-2B.

Implementation Details

The model employs a sophisticated multi-stage training approach, incorporating FlashAttention-2, Torch Compile, and Fully Sharded Data Parallelism (FSDP) with hybrid sharding. Training was conducted across 128 MI300X GPUs distributed over 16 nodes, using bfloat16 mixed-precision training for optimal resource utilization.

Advanced architecture with 36 decoder layers and 32 attention heads
4,096 token context length with ~50,000 vocabulary size
Trained on 4.15 trillion tokens across multiple stages
Implements OLMo tokenizer for efficient processing

Core Capabilities

Outperforms existing open models by 8.08% on average across benchmarks
Exceptional performance in mathematical reasoning (GSM8K: +48.98% improvement)
Strong instruction-following capabilities through SFT and DPO training
Competitive performance with closed-source models in various tasks

Frequently Asked Questions

Q: What makes this model unique?

Instella-3B stands out for being fully open-source while achieving competitive performance with closed-source models. It demonstrates AMD's hardware capabilities in AI training and represents a significant advancement in accessible AI research.

Q: What are the recommended use cases?

The model excels in instruction following, mathematical reasoning, and general language understanding tasks. However, it's designed for research purposes only and should not be used in safety-critical situations or medical applications.

Instella-3B

Instella-3B

What is Instella-3B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models