mpt-1b-redpajama-200b-dolly

Maintained By
mosaicml

MPT-1B-RedPajama-200B-Dolly

PropertyValue
Parameter Count1.3 Billion
LicenseCC-BY-SA-3.0
Release DateApril 20, 2023
Training DataRedPajama + Dolly Dataset

What is mpt-1b-redpajama-200b-dolly?

MPT-1B-RedPajama-200B-Dolly is a sophisticated language model developed by MosaicML that combines efficient architecture with comprehensive training. This 1.3 billion parameter decoder-only transformer represents a significant advancement in accessible AI models, having been pre-trained on the RedPajama dataset for 200B tokens and fine-tuned on the Databricks Dolly instruction dataset.

Implementation Details

The model features a modified transformer architecture with 24 layers, 16 attention heads, and width 2048. Its implementation incorporates several cutting-edge optimizations:

  • Uses ALiBi positional encoding instead of traditional positional embeddings
  • Implements QK LayerNorm for enhanced stability
  • Operates without biases for improved efficiency
  • Supports FlashAttention with Triton implementation

Core Capabilities

  • Text generation with instruction-following abilities
  • Efficient processing with FlashAttention support
  • Handles sequence lengths up to 2048 tokens
  • Compatible with PyTorch and Transformers library

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient architecture modifications like ALiBi and FlashAttention with comprehensive pre-training on the RedPajama dataset, followed by instruction fine-tuning. This makes it particularly suitable for practical applications while maintaining reasonable computational requirements.

Q: What are the recommended use cases?

The model is well-suited for text generation tasks, particularly those requiring instruction following. Its moderate size makes it practical for deployment in production environments where larger models might be impractical.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.