Instella-3B-Instruct

Maintained By
amd

Instella-3B-Instruct

PropertyValue
Parameter Count3.11B
Context Length4096 tokens
Architecture36 layers, 32 attention heads
LicenseResearchRAIL
Training Tokens4.15 Trillion

What is Instella-3B-Instruct?

Instella-3B-Instruct is AMD's latest instruction-tuned language model, developed as part of their commitment to open-source AI research. Trained on AMD Instinct MI300X GPUs, this model represents a significant advancement in fully open language models, achieving performance that rivals closed-source competitors while maintaining complete transparency in its development process.

Implementation Details

The model utilizes a transformer-based architecture with 36 decoder layers and 32 attention heads. It implements advanced training techniques including FlashAttention-2, Torch Compile, and Fully Sharded Data Parallelism (FSDP) with hybrid sharding. The training process involved multiple stages: pre-training (4.065T tokens), second-stage pre-training (57.575B tokens), supervised fine-tuning, and direct preference optimization (DPO).

  • Vocabulary size of ~50,000 tokens using OLMo tokenizer
  • 4,096 token context length
  • bfloat16 mixed-precision training
  • Trained across 128 Instinct MI300X GPUs

Core Capabilities

  • Outperforms existing fully open models by 14.37% on average
  • Competitive performance with Llama-3.2-3B and Qwen2.5-3B
  • Strong performance in instruction following and multi-turn QA tasks
  • Enhanced capabilities in mathematical reasoning and knowledge recall

Frequently Asked Questions

Q: What makes this model unique?

Instella-3B-Instruct stands out for being fully open-source while achieving performance comparable to closed-source models. It's trained using AMD's MI300X GPUs and implements state-of-the-art training techniques, making it a significant milestone in open AI development.

Q: What are the recommended use cases?

The model excels in instruction following, multi-turn QA tasks, and mathematical reasoning. However, it's intended for research purposes only and should not be used in safety-critical situations, health applications, or scenarios requiring high factual accuracy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.