Cerebras-GPT-6.7B

Maintained By
cerebras

Cerebras-GPT-6.7B

PropertyValue
Parameter Count6.7 Billion
ArchitectureGPT-3 style
LicenseApache 2.0
PaperarXiv:2304.03208
Training DataThe Pile
Context Length2048 tokens

What is Cerebras-GPT-6.7B?

Cerebras-GPT-6.7B is a large language model developed by Cerebras Systems as part of their research into LLM scaling laws. This model represents a significant milestone in the Cerebras-GPT family, implementing Chinchilla scaling laws with 20 tokens per model parameter for optimal compute efficiency.

Implementation Details

The model features 32 layers with a dimension of 4096 and 32 attention heads. It was trained on the Andromeda AI supercomputer using weight streaming technology, which enables efficient scaling across nodes through data parallelism. The model employs full attention mechanisms rather than sparse banded attention, setting it apart from traditional GPT-3 implementations.

  • Training batch size: 1040 sequences
  • Learning rate: 1.2E-04
  • Vocabulary size: 50257
  • Uses AdamW optimizer with β1=0.9, β2=0.95

Core Capabilities

  • Zero-shot task performance with strong results on various benchmarks
  • Achieves 0.739 on PIQA and 0.636 on Lambada in zero-shot settings
  • Efficient text generation with support for various sampling strategies
  • Seamless integration with Hugging Face's transformers library

Frequently Asked Questions

Q: What makes this model unique?

The model's adherence to Chinchilla scaling laws and its training on the Andromeda AI supercomputer make it particularly efficient. It achieves strong performance while maintaining a balance between model size and training compute.

Q: What are the recommended use cases?

The model is primarily intended for research purposes in NLP, ethics, and alignment research. While it can be fine-tuned for specific applications, it's not recommended for direct deployment in production without additional safety measures and testing.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.