mpt-30b-chat

Maintained By
mosaicml

MPT-30B-Chat

PropertyValue
Parameter Count29.95B
LicenseCC-By-NC-SA-4.0
Context Length8192 tokens
ArchitectureModified decoder-only transformer
Release DateJune 22, 2023

What is MPT-30B-Chat?

MPT-30B-Chat is an advanced language model developed by MosaicML, designed specifically for dialogue generation and multi-turn conversations. Built by fine-tuning the base MPT-30B model on diverse datasets including ShareGPT-Vicuna, Camel-AI, GPTeacher, and others, it represents a significant advancement in open-source language models that outperforms the original GPT-3.

Implementation Details

The model employs a modified decoder-only transformer architecture with several innovative features that enhance its performance and efficiency. The architecture includes 48 layers, 64 attention heads, and a dimensional model size of 7168.

  • Implements FlashAttention for improved computational efficiency
  • Uses ALiBi (Attention with Linear Biases) instead of traditional positional embeddings
  • Features an 8K token context window with expansion capability
  • Trained on 64 H100s for approximately 7.6 hours

Core Capabilities

  • Excels at multi-turn conversations and dialogue generation
  • Strong coding abilities due to specialized pretraining data
  • Supports context-length extrapolation via ALiBi
  • Efficient inference and training performance
  • Handles complex instruction following tasks

Frequently Asked Questions

Q: What makes this model unique?

MPT-30B-Chat stands out due to its combination of size (29.95B parameters), efficient architecture featuring FlashAttention and ALiBi, and its diverse training data mix including high-quality conversational datasets. The model's ability to handle 8K token contexts while supporting further extension makes it particularly versatile.

Q: What are the recommended use cases?

The model is best suited for chatbot applications, multi-turn conversations, coding assistance, and general dialogue generation. However, it's important to note that it's licensed for non-commercial use only under CC-By-NC-SA-4.0.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.