Instella-3B

Maintained By
amd

Instella-3B

PropertyValue
Parameter Count3.11B
Context Length4096 tokens
Training Tokens4.15 Trillion
LicenseResearchRAIL
Model TypeDecoder-only Transformer
Architecture36 layers, 32 attention heads

What is Instella-3B?

Instella-3B is AMD's groundbreaking open-source language model, trained from scratch on AMD Instinct MI300X GPUs. This 3-billion parameter model represents a significant advancement in open AI development, achieving performance that outperforms existing fully open models and competing with state-of-the-art closed models like Llama-3.2-3B and Gemma-2-2B.

Implementation Details

The model employs a sophisticated multi-stage training approach, incorporating FlashAttention-2, Torch Compile, and Fully Sharded Data Parallelism (FSDP) with hybrid sharding. Training was conducted across 128 MI300X GPUs distributed over 16 nodes, using bfloat16 mixed-precision training for optimal resource utilization.

  • Advanced architecture with 36 decoder layers and 32 attention heads
  • 4,096 token context length with ~50,000 vocabulary size
  • Trained on 4.15 trillion tokens across multiple stages
  • Implements OLMo tokenizer for efficient processing

Core Capabilities

  • Outperforms existing open models by 8.08% on average across benchmarks
  • Exceptional performance in mathematical reasoning (GSM8K: +48.98% improvement)
  • Strong instruction-following capabilities through SFT and DPO training
  • Competitive performance with closed-source models in various tasks

Frequently Asked Questions

Q: What makes this model unique?

Instella-3B stands out for being fully open-source while achieving competitive performance with closed-source models. It demonstrates AMD's hardware capabilities in AI training and represents a significant advancement in accessible AI research.

Q: What are the recommended use cases?

The model excels in instruction following, mathematical reasoning, and general language understanding tasks. However, it's designed for research purposes only and should not be used in safety-critical situations or medical applications.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.