AMD-OLMo

AMD-OLMo

amd

AMD-OLMo is a 1.2B parameter language model trained on AMD MI250 GPUs, offering strong performance across multiple benchmarks with SFT and DPO variants available.

PropertyValue
Parameter Count1.2B
LicenseApache 2.0
Training DataDolma v1.7 (1.3T tokens)
Context Length2048 tokens

What is AMD-OLMo?

AMD-OLMo is a series of 1B parameter language models developed by AMD, trained from scratch on AMD Instinct™ MI250 GPUs. The model comes in three variants: a base pre-trained model, a supervised fine-tuned (SFT) version, and a DPO-aligned model optimized for human preferences. Built on the OLMo architecture, it achieves impressive performance across various benchmarks while maintaining computational efficiency.

Implementation Details

The model architecture consists of 16 layers with 16 attention heads and a hidden size of 2048. Training was performed across 16 nodes, each containing 4 AMD Instinct™ MI250 GPUs, achieving a training throughput of 12,200 tokens/sec/gpu. The model uses a vocabulary size of 50,280 tokens and implements different learning rate schedules for various training phases.

  • Pretraining: Cosine LR schedule with 4.0e-4 peak LR
  • SFT: Two-phase training on multiple instruction datasets
  • DPO: Alignment using UltraFeedback dataset

Core Capabilities

  • Strong performance on reasoning tasks (63.64% on ARC Easy)
  • Competitive instruction-following abilities (54.22% win rate on AlpacaEval)
  • Balanced performance across multiple benchmarks including MMLU and GSM8K
  • Responsible AI considerations with evaluations on ToxiGen and TruthfulQA

Frequently Asked Questions

Q: What makes this model unique?

AMD-OLMo stands out for its efficient training on AMD hardware and competitive performance despite its relatively small size. It demonstrates strong capabilities across various benchmarks and offers multiple variants optimized for different use cases.

Q: What are the recommended use cases?

The model is best suited for research purposes and general language tasks. However, it's not recommended for safety-critical situations, health applications, or cases requiring high factual accuracy. Users should implement appropriate safety filters based on their specific use cases.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026