opt-66b

Maintained By
facebook

OPT-66B

PropertyValue
DeveloperMeta AI (Facebook)
Model TypeDecoder-only Transformer
Training Data180B tokens (800GB)
LicenseOther (Research Only)
PaperOpen Pre-trained Transformer Language Models

What is opt-66b?

OPT-66B is part of Meta AI's Open Pre-trained Transformer (OPT) series, designed to democratize access to large language models. This 66-billion parameter model represents a significant milestone in open-source AI, trained to match GPT-3's capabilities while promoting transparent and responsible AI research.

Implementation Details

The model utilizes a decoder-only transformer architecture trained using causal language modeling on a diverse dataset including BookCorpus, CC-Stories, The Pile, Reddit, and CCNewsV2. It employs GPT2's byte-level BPE tokenization with a 50,272 token vocabulary and processes sequences of 2,048 tokens.

  • Training Infrastructure: 992 80GB A100 GPUs
  • Training Duration: ~33 days
  • Tokenization: GPT2 byte-level BPE
  • Context Window: 2048 tokens

Core Capabilities

  • Text Generation and Completion
  • Zero-shot and Few-shot Learning
  • Natural Language Understanding
  • Custom Text Generation with Sampling

Frequently Asked Questions

Q: What makes this model unique?

OPT-66B stands out for being one of the largest open-source language models available for research, offering capabilities similar to GPT-3 while promoting transparency in AI development.

Q: What are the recommended use cases?

The model is best suited for research purposes, text generation tasks, and downstream task fine-tuning. It's particularly useful for studying large language model behavior, bias, and capabilities in a research context.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.