Amber

Property	Value
Parameter Count	6.7B
License	Apache 2.0
Architecture	LLaMA
Paper	arXiv:2312.06550
Training Data	1.26T tokens

What is Amber?

Amber is an innovative 6.7B parameter language model developed by LLM360 as part of their Pebble model series. It represents a significant step towards transparent AI development, offering unprecedented access to training details, checkpoints, and intermediate results. Built on the LLaMA architecture, Amber is designed to make LLM training knowledge accessible to all researchers and developers.

Implementation Details

The model features a sophisticated architecture with 32 attention heads, 32 hidden layers, and a hidden size of 4096. It processes sequences up to 2048 tokens and utilizes a vocabulary size of 32,000. Training data encompasses diverse sources including ArXiv, Books, C4, Refined-Web, StarCoder, StackExchange, and Wikipedia, totaling 1.259 trillion tokens.

Hidden size: 4096 with 11008 intermediate size in MLPs
32 attention heads with 32 hidden layers
Utilizes RMSNorm with ε=1e-6
BF16 tensor type for efficient computation

Core Capabilities

Text generation and language understanding
Strong performance on various benchmarks (ARC-C: 42.57, HellaSwag: 73.91)
Access to 360 training checkpoints for research
Comprehensive documentation and training logs

Frequently Asked Questions

Q: What makes this model unique?

Amber stands out for its unprecedented transparency in the AI community. It provides complete access to all training checkpoints, fully-prepared pre-training datasets, and comprehensive training details, making it an invaluable resource for researchers and developers studying LLM development.

Q: What are the recommended use cases?

While Amber is not a SOTA model, it's ideal for research purposes, understanding LLM training processes, and developing applications where model transparency is crucial. It's particularly valuable for educational purposes and studying model evolution through training phases.

Amber

Amber

What is Amber?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models