OPT-350M
Property | Value |
---|---|
Developer | Meta AI (Facebook) |
Architecture | Decoder-only Transformer |
License | Other (Custom) |
Paper | Open Pre-trained Transformer Language Models |
What is opt-350m?
OPT-350M is part of Meta AI's Open Pre-trained Transformer (OPT) series, designed to democratize access to large language models. This 350M parameter model represents a more accessible version of the technology, trained using causal language modeling on a diverse dataset of 180B tokens.
Implementation Details
The model utilizes GPT2's byte-level Byte Pair Encoding with a vocabulary size of 50,272 tokens and processes sequences of 2,048 consecutive tokens. It's trained on a comprehensive dataset including BookCorpus, CC-Stories, selected components of The Pile, Pushshift.io Reddit data, and CCNewsV2.
- Pre-training objective: Causal Language Modeling (CLM)
- Primary language: English (with some multilingual content via CommonCrawl)
- Training data size: 800GB (180B tokens)
- Tokenization: GPT2 byte-level BPE
Core Capabilities
- Text generation and completion
- Zero-shot and few-shot learning
- Custom prompt-based tasks
- Research and experimentation
Frequently Asked Questions
Q: What makes this model unique?
OPT-350M stands out for its open-access nature and research-friendly design, allowing researchers to study large language model behavior while requiring fewer computational resources than larger variants.
Q: What are the recommended use cases?
The model is best suited for text generation tasks, research purposes, and fine-tuning for specific downstream applications. It can be easily implemented using the Hugging Face transformers library for both inference and fine-tuning.