mpt-7b

Maintained By
mosaicml

MPT-7B

PropertyValue
Parameters6.7B
LicenseApache-2.0
Training Tokens1T
Context Length2048 (extensible)
ArchitectureModified Decoder-Only Transformer

What is MPT-7B?

MPT-7B is a groundbreaking decoder-style transformer model developed by MosaicML, trained on 1 trillion tokens of English text and code. It represents a significant advancement in open-source language models, particularly notable for its commercial usability and efficient architecture.

Implementation Details

The model implements several innovative architectural modifications, including FlashAttention for performance optimization and Attention with Linear Biases (ALiBi) for handling extended context lengths. Built using a custom MPT architecture, it features 32 layers, 32 attention heads, and a model dimension of 4096.

  • Uses FlashAttention for optimized performance
  • Implements ALiBi positioning instead of traditional positional embeddings
  • Operates without traditional biases in the architecture
  • Supports dynamic context length adjustment

Core Capabilities

  • Handles extremely long input sequences (up to 84k tokens)
  • Supports fast training and inference through optimized implementations
  • Provides commercial usage rights under Apache 2.0 license
  • Enables efficient serving through HuggingFace pipelines and NVIDIA FasterTransformer

Frequently Asked Questions

Q: What makes this model unique?

MPT-7B stands out for its commercial usage rights, extensive training data (1T tokens), ability to handle extremely long inputs through ALiBi, and optimized architecture for both training and inference.

Q: What are the recommended use cases?

The base model is designed for finetuning rather than direct deployment. It serves as a foundation for specialized models like MPT-7B-StoryWriter for long-form content, MPT-7B-Instruct for instruction following, and MPT-7B-Chat for dialogue generation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.