Instella-3B-Stage1

Maintained By
amd

Instella-3B-Stage1

PropertyValue
Parameter Count3.11B
Training Tokens4.065T
LicenseResearchRAIL
Architecture36 layers, 32 attention heads
Context Length4096 tokens
Model TypeCausal Language Model

What is Instella-3B-Stage1?

Instella-3B-Stage1 is AMD's groundbreaking first-stage pre-trained language model, developed as part of their commitment to advancing open-source AI. Trained on AMD Instinct™ MI300X GPUs, this model represents the initial phase of a sophisticated multi-stage training approach, establishing strong foundations in natural language understanding.

Implementation Details

The model is built with cutting-edge architecture featuring 36 decoder layers and 32 attention heads, with a model hidden size of 2560 and MLP hidden size of 13824. It utilizes advanced training techniques including FlashAttention-2, Torch Compile, and Fully Sharded Data Parallelism (FSDP) with hybrid sharding.

  • Trained using AdamW optimizer with peak learning rate of 4.0e-4
  • Implements cosine learning rate scheduler with warmup
  • Uses bfloat16 mixed-precision training
  • Supports context length of up to 4,096 tokens

Core Capabilities

  • Outperforms existing fully open models across multiple benchmarks
  • Achieves 61.33% average score on standard benchmarks
  • Excels in ARC Challenge (53.85%) and ARC Easy (73.16%)
  • Strong performance in knowledge-intensive tasks

Frequently Asked Questions

Q: What makes this model unique?

Instella-3B-Stage1 stands out for its exceptional performance despite being fully open-source, trained on AMD's MI300X GPUs, and achieving competitive results with significantly fewer training tokens compared to similar models.

Q: What are the recommended use cases?

The model is designed for research purposes and excels in tasks requiring natural language understanding, including question answering, reasoning, and knowledge-intensive applications. However, it's not recommended for safety-critical or medical applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.