mpt-1b-redpajama-200b

Maintained By
mosaicml

MPT-1b-RedPajama-200b

PropertyValue
Parameter Count1.3 Billion
LicenseApache 2.0
Release DateApril 20, 2023
Training Infrastructure440 A100-40GB GPUs
Architecture24 layers, 16 attention heads, width 2048

What is mpt-1b-redpajama-200b?

MPT-1b-RedPajama-200b is a sophisticated decoder-only transformer model developed by MosaicML, trained on the comprehensive RedPajama dataset. This model represents a significant advancement in efficient language model architecture, trained for 200B tokens using a carefully curated mix of data sources mirroring the successful Llama series of models.

Implementation Details

The model implements several cutting-edge technical features that set it apart from standard transformer architectures. It utilizes the MosaicML LLM codebase and incorporates advanced optimization techniques for improved performance.

  • Employs ALiBi positional encoding instead of traditional positional embeddings
  • Implements QK LayerNorm for enhanced stability
  • Operates without traditional transformer biases
  • Supports FlashAttention with Triton implementation for optimization
  • Uses the EleutherAI/gpt-neox-20b tokenizer

Core Capabilities

  • Efficient text generation with optimized attention mechanisms
  • Handles diverse content types due to varied training data (CommonCrawl, GitHub, Wikipedia, etc.)
  • Supports both CPU and GPU inference with bfloat16 optimization
  • Scalable deployment with FSDP sharding support

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness stems from its efficient architecture combining ALiBi, QK LayerNorm, and FlashAttention, trained on a carefully balanced dataset mix matching the Llama training distribution.

Q: What are the recommended use cases?

The model is well-suited for general text generation tasks, research applications, and scenarios requiring efficient transformer-based language processing. It's particularly effective when implemented with its optimized Triton FlashAttention feature.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.