OPT-66B

Property	Value
Developer	Meta AI (Facebook)
Model Type	Decoder-only Transformer
Training Data	180B tokens (800GB)
License	Other (Research Only)
Paper	Open Pre-trained Transformer Language Models

What is opt-66b?

OPT-66B is part of Meta AI's Open Pre-trained Transformer (OPT) series, designed to democratize access to large language models. This 66-billion parameter model represents a significant milestone in open-source AI, trained to match GPT-3's capabilities while promoting transparent and responsible AI research.

Implementation Details

The model utilizes a decoder-only transformer architecture trained using causal language modeling on a diverse dataset including BookCorpus, CC-Stories, The Pile, Reddit, and CCNewsV2. It employs GPT2's byte-level BPE tokenization with a 50,272 token vocabulary and processes sequences of 2,048 tokens.

Training Infrastructure: 992 80GB A100 GPUs
Training Duration: ~33 days
Tokenization: GPT2 byte-level BPE
Context Window: 2048 tokens

Core Capabilities

Text Generation and Completion
Zero-shot and Few-shot Learning
Natural Language Understanding
Custom Text Generation with Sampling

Frequently Asked Questions

Q: What makes this model unique?

OPT-66B stands out for being one of the largest open-source language models available for research, offering capabilities similar to GPT-3 while promoting transparency in AI development.

Q: What are the recommended use cases?

The model is best suited for research purposes, text generation tasks, and downstream task fine-tuning. It's particularly useful for studying large language model behavior, bias, and capabilities in a research context.

opt-66b