MPT-7B-Storywriter-GGML

Maintained By
TheBloke

MPT-7B-Storywriter-GGML

PropertyValue
Parameter Count6.7B
LicenseApache 2.0
Context Length65,536 tokens
ArchitectureModified decoder-only transformer
Base ModelMPT-7B

What is MPT-7B-Storywriter-GGML?

MPT-7B-Storywriter-GGML is a GGML-quantized version of MosaicML's story-focused language model, specifically optimized for CPU inference. It's designed for reading and writing fictional stories with exceptionally long context lengths, supporting up to 65k tokens and beyond. The model has been converted to various quantization levels (4-bit, 5-bit, and 8-bit) to accommodate different hardware capabilities and performance requirements.

Implementation Details

The model incorporates several advanced technical features, including FlashAttention for efficient computation, ALiBi (Attention with Linear Biases) for position encoding, and architecture modifications such as QK LayerNorm. It uses the EleutherAI/gpt-neox-20b tokenizer and has been fine-tuned on a curated fiction subset of the books3 dataset.

  • Multiple quantization options ranging from 4.21GB to 7.58GB file sizes
  • Compatible with KoboldCpp, ctransformers, GPT4All-UI, and rustformers' llm
  • Supports extrapolation beyond training context length through ALiBi
  • 32 layers with 32 attention heads and 4096 dimensional embeddings

Core Capabilities

  • Long-form story generation with coherent narratives
  • Extended context handling up to 84k tokens
  • Efficient CPU inference with various quantization options
  • Creative writing and story continuation
  • Memory-efficient operation with different RAM requirements based on quantization

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its extraordinary context length capability and specific optimization for story writing tasks, combined with efficient GGML quantization for CPU deployment.

Q: What are the recommended use cases?

The model excels at creative writing tasks, story continuation, and handling long-form narrative content. It's particularly suitable for applications requiring extended context understanding and generation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.