MPT-7B
Property | Value |
---|---|
Parameters | 6.7B |
License | Apache-2.0 |
Training Tokens | 1T |
Context Length | 2048 (extensible) |
Architecture | Modified Decoder-Only Transformer |
What is MPT-7B?
MPT-7B is a groundbreaking decoder-style transformer model developed by MosaicML, trained on 1 trillion tokens of English text and code. It represents a significant advancement in open-source language models, particularly notable for its commercial usability and efficient architecture.
Implementation Details
The model implements several innovative architectural modifications, including FlashAttention for performance optimization and Attention with Linear Biases (ALiBi) for handling extended context lengths. Built using a custom MPT architecture, it features 32 layers, 32 attention heads, and a model dimension of 4096.
- Uses FlashAttention for optimized performance
- Implements ALiBi positioning instead of traditional positional embeddings
- Operates without traditional biases in the architecture
- Supports dynamic context length adjustment
Core Capabilities
- Handles extremely long input sequences (up to 84k tokens)
- Supports fast training and inference through optimized implementations
- Provides commercial usage rights under Apache 2.0 license
- Enables efficient serving through HuggingFace pipelines and NVIDIA FasterTransformer
Frequently Asked Questions
Q: What makes this model unique?
MPT-7B stands out for its commercial usage rights, extensive training data (1T tokens), ability to handle extremely long inputs through ALiBi, and optimized architecture for both training and inference.
Q: What are the recommended use cases?
The base model is designed for finetuning rather than direct deployment. It serves as a foundation for specialized models like MPT-7B-StoryWriter for long-form content, MPT-7B-Instruct for instruction following, and MPT-7B-Chat for dialogue generation.