MPT-7B-StoryWriter
Property | Value |
---|---|
Parameter Count | 6.7B |
Context Length | 65,536 tokens |
License | Apache 2.0 |
Architecture | Modified decoder-only transformer |
Training Data | Fiction subset of books3 dataset |
What is MPT-7B-StoryWriter?
MPT-7B-StoryWriter is a specialized language model designed specifically for reading and writing fictional stories with exceptionally long context lengths. Developed by MosaicML, this model represents a significant advancement in long-form content generation, capable of handling contexts up to 65k tokens and even extrapolating beyond to 84k tokens during inference.
Implementation Details
The model employs several cutting-edge architectural innovations, including FlashAttention for efficient computation, ALiBi (Attention with Linear Biases) for position encoding, and operates without traditional positional embeddings or bias terms. Built on a modified decoder-only transformer architecture, it features 32 layers, 32 attention heads, and a model dimension of 4096.
- Utilizes FlashAttention for optimal performance
- Implements ALiBi for position-aware attention
- Supports dynamic context length extension
- Trained on 8 A100-80GB GPUs using FSDP and LION optimizer
Core Capabilities
- Long-form story generation and continuation
- Extended context understanding (65k+ tokens)
- Efficient processing with FlashAttention support
- Commercial usage permitted under Apache 2.0 license
- Compatible with popular frameworks via Hugging Face integration
Frequently Asked Questions
Q: What makes this model unique?
The model's standout feature is its ability to handle extremely long context lengths (65k+ tokens) while maintaining coherent story generation, making it particularly suitable for long-form fiction writing and processing entire books.
Q: What are the recommended use cases?
MPT-7B-StoryWriter excels at creative writing tasks, including story generation, continuation, and analysis of long-form fiction. It's particularly useful for applications requiring understanding and generation of extended narrative contexts.