MPT-7B

Property	Value
Parameters	6.7B
License	Apache-2.0
Training Tokens	1T
Context Length	2048 (extensible)
Architecture	Modified Decoder-Only Transformer

What is MPT-7B?

MPT-7B is a groundbreaking decoder-style transformer model developed by MosaicML, trained on 1 trillion tokens of English text and code. It represents a significant advancement in open-source language models, particularly notable for its commercial usability and efficient architecture.

Implementation Details

The model implements several innovative architectural modifications, including FlashAttention for performance optimization and Attention with Linear Biases (ALiBi) for handling extended context lengths. Built using a custom MPT architecture, it features 32 layers, 32 attention heads, and a model dimension of 4096.

Uses FlashAttention for optimized performance
Implements ALiBi positioning instead of traditional positional embeddings
Operates without traditional biases in the architecture
Supports dynamic context length adjustment

Core Capabilities

Handles extremely long input sequences (up to 84k tokens)
Supports fast training and inference through optimized implementations
Provides commercial usage rights under Apache 2.0 license
Enables efficient serving through HuggingFace pipelines and NVIDIA FasterTransformer

Frequently Asked Questions

Q: What makes this model unique?

MPT-7B stands out for its commercial usage rights, extensive training data (1T tokens), ability to handle extremely long inputs through ALiBi, and optimized architecture for both training and inference.

Q: What are the recommended use cases?

The base model is designed for finetuning rather than direct deployment. It serves as a foundation for specialized models like MPT-7B-StoryWriter for long-form content, MPT-7B-Instruct for instruction following, and MPT-7B-Chat for dialogue generation.

mpt-7b