Cerebras-GPT-111M

Maintained By
cerebras

Cerebras-GPT-111M

PropertyValue
Parameter Count111M
LicenseApache 2.0
PaperarXiv:2304.03208
Training DataThe Pile
Context Length2048 tokens

What is Cerebras-GPT-111M?

Cerebras-GPT-111M is part of the Cerebras-GPT family of language models, designed to advance research in LLM scaling laws. This particular model contains 111 million parameters and was trained using compute-optimal Chinchilla scaling laws on The Pile dataset. The model features 10 layers with a dimension of 768 and 12 attention heads, making it an efficient choice for research and development purposes.

Implementation Details

The model implements a GPT-3 style architecture with full attention mechanisms, trained using the AdamW optimizer with specific hyperparameters (β1=0.9, β2=0.95). It was trained on the Andromeda AI supercomputer using weight streaming technology, which enables efficient scaling across nodes through data parallelism.

  • Vocabulary Size: 50257 tokens
  • Training Steps: 9037
  • Batch Size: 120 sequences
  • Learning Rate: 6.0E-04

Core Capabilities

  • Text Generation and Completion
  • Zero-shot and Few-shot Learning
  • Research Applications in Language Model Scaling
  • Foundation for Fine-tuning Tasks

Frequently Asked Questions

Q: What makes this model unique?

This model is uniquely positioned as a research-focused language model that strictly follows Chinchilla scaling laws, training with 20 tokens per parameter. It provides an excellent baseline for studying LLM scaling behaviors and serves as a foundation for further research and development.

Q: What are the recommended use cases?

The model is best suited for research purposes, including studying LLM scaling laws, architectural improvements, and as a base model for fine-tuning experiments. It's not recommended for production deployment without additional safety-related testing and mitigations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.