mpt-7b-storywriter

Maintained By
mosaicml

MPT-7B-StoryWriter

PropertyValue
Parameter Count6.7B
Context Length65,536 tokens
LicenseApache 2.0
ArchitectureModified decoder-only transformer
Training DataFiction subset of books3 dataset

What is MPT-7B-StoryWriter?

MPT-7B-StoryWriter is a specialized language model designed specifically for reading and writing fictional stories with exceptionally long context lengths. Developed by MosaicML, this model represents a significant advancement in long-form content generation, capable of handling contexts up to 65k tokens and even extrapolating beyond to 84k tokens during inference.

Implementation Details

The model employs several cutting-edge architectural innovations, including FlashAttention for efficient computation, ALiBi (Attention with Linear Biases) for position encoding, and operates without traditional positional embeddings or bias terms. Built on a modified decoder-only transformer architecture, it features 32 layers, 32 attention heads, and a model dimension of 4096.

  • Utilizes FlashAttention for optimal performance
  • Implements ALiBi for position-aware attention
  • Supports dynamic context length extension
  • Trained on 8 A100-80GB GPUs using FSDP and LION optimizer

Core Capabilities

  • Long-form story generation and continuation
  • Extended context understanding (65k+ tokens)
  • Efficient processing with FlashAttention support
  • Commercial usage permitted under Apache 2.0 license
  • Compatible with popular frameworks via Hugging Face integration

Frequently Asked Questions

Q: What makes this model unique?

The model's standout feature is its ability to handle extremely long context lengths (65k+ tokens) while maintaining coherent story generation, making it particularly suitable for long-form fiction writing and processing entire books.

Q: What are the recommended use cases?

MPT-7B-StoryWriter excels at creative writing tasks, including story generation, continuation, and analysis of long-form fiction. It's particularly useful for applications requiring understanding and generation of extended narrative contexts.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.