OPT-66B
Property | Value |
---|---|
Developer | Meta AI (Facebook) |
Model Type | Decoder-only Transformer |
Training Data | 180B tokens (800GB) |
License | Other (Research Only) |
Paper | Open Pre-trained Transformer Language Models |
What is opt-66b?
OPT-66B is part of Meta AI's Open Pre-trained Transformer (OPT) series, designed to democratize access to large language models. This 66-billion parameter model represents a significant milestone in open-source AI, trained to match GPT-3's capabilities while promoting transparent and responsible AI research.
Implementation Details
The model utilizes a decoder-only transformer architecture trained using causal language modeling on a diverse dataset including BookCorpus, CC-Stories, The Pile, Reddit, and CCNewsV2. It employs GPT2's byte-level BPE tokenization with a 50,272 token vocabulary and processes sequences of 2,048 tokens.
- Training Infrastructure: 992 80GB A100 GPUs
- Training Duration: ~33 days
- Tokenization: GPT2 byte-level BPE
- Context Window: 2048 tokens
Core Capabilities
- Text Generation and Completion
- Zero-shot and Few-shot Learning
- Natural Language Understanding
- Custom Text Generation with Sampling
Frequently Asked Questions
Q: What makes this model unique?
OPT-66B stands out for being one of the largest open-source language models available for research, offering capabilities similar to GPT-3 while promoting transparency in AI development.
Q: What are the recommended use cases?
The model is best suited for research purposes, text generation tasks, and downstream task fine-tuning. It's particularly useful for studying large language model behavior, bias, and capabilities in a research context.