MPT-30B-Chat
Property | Value |
---|---|
Parameter Count | 29.95B |
License | CC-By-NC-SA-4.0 |
Context Length | 8192 tokens |
Architecture | Modified decoder-only transformer |
Release Date | June 22, 2023 |
What is MPT-30B-Chat?
MPT-30B-Chat is an advanced language model developed by MosaicML, designed specifically for dialogue generation and multi-turn conversations. Built by fine-tuning the base MPT-30B model on diverse datasets including ShareGPT-Vicuna, Camel-AI, GPTeacher, and others, it represents a significant advancement in open-source language models that outperforms the original GPT-3.
Implementation Details
The model employs a modified decoder-only transformer architecture with several innovative features that enhance its performance and efficiency. The architecture includes 48 layers, 64 attention heads, and a dimensional model size of 7168.
- Implements FlashAttention for improved computational efficiency
- Uses ALiBi (Attention with Linear Biases) instead of traditional positional embeddings
- Features an 8K token context window with expansion capability
- Trained on 64 H100s for approximately 7.6 hours
Core Capabilities
- Excels at multi-turn conversations and dialogue generation
- Strong coding abilities due to specialized pretraining data
- Supports context-length extrapolation via ALiBi
- Efficient inference and training performance
- Handles complex instruction following tasks
Frequently Asked Questions
Q: What makes this model unique?
MPT-30B-Chat stands out due to its combination of size (29.95B parameters), efficient architecture featuring FlashAttention and ALiBi, and its diverse training data mix including high-quality conversational datasets. The model's ability to handle 8K token contexts while supporting further extension makes it particularly versatile.
Q: What are the recommended use cases?
The model is best suited for chatbot applications, multi-turn conversations, coding assistance, and general dialogue generation. However, it's important to note that it's licensed for non-commercial use only under CC-By-NC-SA-4.0.