AMD-OLMo

Property	Value
Parameter Count	1.2B
License	Apache 2.0
Training Data	Dolma v1.7 (1.3T tokens)
Context Length	2048 tokens

What is AMD-OLMo?

AMD-OLMo is a series of 1B parameter language models developed by AMD, trained from scratch on AMD Instinct™ MI250 GPUs. The model comes in three variants: a base pre-trained model, a supervised fine-tuned (SFT) version, and a DPO-aligned model optimized for human preferences. Built on the OLMo architecture, it achieves impressive performance across various benchmarks while maintaining computational efficiency.

Implementation Details

The model architecture consists of 16 layers with 16 attention heads and a hidden size of 2048. Training was performed across 16 nodes, each containing 4 AMD Instinct™ MI250 GPUs, achieving a training throughput of 12,200 tokens/sec/gpu. The model uses a vocabulary size of 50,280 tokens and implements different learning rate schedules for various training phases.

Pretraining: Cosine LR schedule with 4.0e-4 peak LR
SFT: Two-phase training on multiple instruction datasets
DPO: Alignment using UltraFeedback dataset

Core Capabilities

Strong performance on reasoning tasks (63.64% on ARC Easy)
Competitive instruction-following abilities (54.22% win rate on AlpacaEval)
Balanced performance across multiple benchmarks including MMLU and GSM8K
Responsible AI considerations with evaluations on ToxiGen and TruthfulQA

Frequently Asked Questions

Q: What makes this model unique?

AMD-OLMo stands out for its efficient training on AMD hardware and competitive performance despite its relatively small size. It demonstrates strong capabilities across various benchmarks and offers multiple variants optimized for different use cases.

Q: What are the recommended use cases?

The model is best suited for research purposes and general language tasks. However, it's not recommended for safety-critical situations, health applications, or cases requiring high factual accuracy. Users should implement appropriate safety filters based on their specific use cases.

AMD-OLMo

AMD-OLMo

What is AMD-OLMo?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models