MPT-7B-Storywriter-GGML
Property | Value |
---|---|
Parameter Count | 6.7B |
License | Apache 2.0 |
Context Length | 65,536 tokens |
Architecture | Modified decoder-only transformer |
Base Model | MPT-7B |
What is MPT-7B-Storywriter-GGML?
MPT-7B-Storywriter-GGML is a GGML-quantized version of MosaicML's story-focused language model, specifically optimized for CPU inference. It's designed for reading and writing fictional stories with exceptionally long context lengths, supporting up to 65k tokens and beyond. The model has been converted to various quantization levels (4-bit, 5-bit, and 8-bit) to accommodate different hardware capabilities and performance requirements.
Implementation Details
The model incorporates several advanced technical features, including FlashAttention for efficient computation, ALiBi (Attention with Linear Biases) for position encoding, and architecture modifications such as QK LayerNorm. It uses the EleutherAI/gpt-neox-20b tokenizer and has been fine-tuned on a curated fiction subset of the books3 dataset.
- Multiple quantization options ranging from 4.21GB to 7.58GB file sizes
- Compatible with KoboldCpp, ctransformers, GPT4All-UI, and rustformers' llm
- Supports extrapolation beyond training context length through ALiBi
- 32 layers with 32 attention heads and 4096 dimensional embeddings
Core Capabilities
- Long-form story generation with coherent narratives
- Extended context handling up to 84k tokens
- Efficient CPU inference with various quantization options
- Creative writing and story continuation
- Memory-efficient operation with different RAM requirements based on quantization
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its extraordinary context length capability and specific optimization for story writing tasks, combined with efficient GGML quantization for CPU deployment.
Q: What are the recommended use cases?
The model excels at creative writing tasks, story continuation, and handling long-form narrative content. It's particularly suitable for applications requiring extended context understanding and generation.