Amber
Property | Value |
---|---|
Parameter Count | 6.7B |
License | Apache 2.0 |
Architecture | LLaMA |
Paper | arXiv:2312.06550 |
Training Data | 1.26T tokens |
What is Amber?
Amber is an innovative 6.7B parameter language model developed by LLM360 as part of their Pebble model series. It represents a significant step towards transparent AI development, offering unprecedented access to training details, checkpoints, and intermediate results. Built on the LLaMA architecture, Amber is designed to make LLM training knowledge accessible to all researchers and developers.
Implementation Details
The model features a sophisticated architecture with 32 attention heads, 32 hidden layers, and a hidden size of 4096. It processes sequences up to 2048 tokens and utilizes a vocabulary size of 32,000. Training data encompasses diverse sources including ArXiv, Books, C4, Refined-Web, StarCoder, StackExchange, and Wikipedia, totaling 1.259 trillion tokens.
- Hidden size: 4096 with 11008 intermediate size in MLPs
- 32 attention heads with 32 hidden layers
- Utilizes RMSNorm with ε=1e-6
- BF16 tensor type for efficient computation
Core Capabilities
- Text generation and language understanding
- Strong performance on various benchmarks (ARC-C: 42.57, HellaSwag: 73.91)
- Access to 360 training checkpoints for research
- Comprehensive documentation and training logs
Frequently Asked Questions
Q: What makes this model unique?
Amber stands out for its unprecedented transparency in the AI community. It provides complete access to all training checkpoints, fully-prepared pre-training datasets, and comprehensive training details, making it an invaluable resource for researchers and developers studying LLM development.
Q: What are the recommended use cases?
While Amber is not a SOTA model, it's ideal for research purposes, understanding LLM training processes, and developing applications where model transparency is crucial. It's particularly valuable for educational purposes and studying model evolution through training phases.