mpt-7b-storywriter

mpt-7b-storywriter

mosaicml

MPT-7B-StoryWriter is a 6.7B parameter LLM optimized for long-form fiction with 65k+ token context length using ALiBi attention.

PropertyValue
Parameter Count6.7B
Context Length65,536 tokens
LicenseApache 2.0
ArchitectureModified decoder-only transformer
Training DataFiction subset of books3 dataset

What is MPT-7B-StoryWriter?

MPT-7B-StoryWriter is a specialized language model designed specifically for reading and writing fictional stories with exceptionally long context lengths. Developed by MosaicML, this model represents a significant advancement in long-form content generation, capable of handling contexts up to 65k tokens and even extrapolating beyond to 84k tokens during inference.

Implementation Details

The model employs several cutting-edge architectural innovations, including FlashAttention for efficient computation, ALiBi (Attention with Linear Biases) for position encoding, and operates without traditional positional embeddings or bias terms. Built on a modified decoder-only transformer architecture, it features 32 layers, 32 attention heads, and a model dimension of 4096.

  • Utilizes FlashAttention for optimal performance
  • Implements ALiBi for position-aware attention
  • Supports dynamic context length extension
  • Trained on 8 A100-80GB GPUs using FSDP and LION optimizer

Core Capabilities

  • Long-form story generation and continuation
  • Extended context understanding (65k+ tokens)
  • Efficient processing with FlashAttention support
  • Commercial usage permitted under Apache 2.0 license
  • Compatible with popular frameworks via Hugging Face integration

Frequently Asked Questions

Q: What makes this model unique?

The model's standout feature is its ability to handle extremely long context lengths (65k+ tokens) while maintaining coherent story generation, making it particularly suitable for long-form fiction writing and processing entire books.

Q: What are the recommended use cases?

MPT-7B-StoryWriter excels at creative writing tasks, including story generation, continuation, and analysis of long-form fiction. It's particularly useful for applications requiring understanding and generation of extended narrative contexts.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026