OPT-350M

Property	Value
Developer	Meta AI (Facebook)
Architecture	Decoder-only Transformer
License	Other (Custom)
Paper	Open Pre-trained Transformer Language Models

What is opt-350m?

OPT-350M is part of Meta AI's Open Pre-trained Transformer (OPT) series, designed to democratize access to large language models. This 350M parameter model represents a more accessible version of the technology, trained using causal language modeling on a diverse dataset of 180B tokens.

Implementation Details

The model utilizes GPT2's byte-level Byte Pair Encoding with a vocabulary size of 50,272 tokens and processes sequences of 2,048 consecutive tokens. It's trained on a comprehensive dataset including BookCorpus, CC-Stories, selected components of The Pile, Pushshift.io Reddit data, and CCNewsV2.

Pre-training objective: Causal Language Modeling (CLM)
Primary language: English (with some multilingual content via CommonCrawl)
Training data size: 800GB (180B tokens)
Tokenization: GPT2 byte-level BPE

Core Capabilities

Text generation and completion
Zero-shot and few-shot learning
Custom prompt-based tasks
Research and experimentation

Frequently Asked Questions

Q: What makes this model unique?

OPT-350M stands out for its open-access nature and research-friendly design, allowing researchers to study large language model behavior while requiring fewer computational resources than larger variants.

Q: What are the recommended use cases?

The model is best suited for text generation tasks, research purposes, and fine-tuning for specific downstream applications. It can be easily implemented using the Hugging Face transformers library for both inference and fine-tuning.

opt-350m