Yarn-Mistral-7B-128k-AWQ

Maintained By
TheBloke

Yarn-Mistral-7B-128k-AWQ

PropertyValue
Parameter Count7 Billion
Context Length128,000 tokens
LicenseApache 2.0
PaperarXiv:2309.00071
Quantization4-bit AWQ

What is Yarn-Mistral-7B-128k-AWQ?

Yarn-Mistral-7B-128k-AWQ is a quantized version of the Yarn-extended Mistral language model, optimized for efficient inference while maintaining impressive performance. This model features a massive 128k token context window, making it particularly suitable for processing long documents and complex conversations. The AWQ quantization reduces the model size to 4.15GB while preserving quality comparable to higher-precision versions.

Implementation Details

The model uses state-of-the-art AWQ (Activation-aware Weight Quantization) technology, operating at 4-bit precision with a group size of 128. It's compatible with major inference frameworks including Text Generation WebUI, vLLM, and Hugging Face's TGI.

  • Quantization Method: 4-bit AWQ
  • Context Length: 128k tokens
  • Base Architecture: Mistral-7B
  • Model Size: 4.15GB

Core Capabilities

  • Long-form text generation with extended context awareness
  • Efficient inference with minimal quality degradation
  • Improved perplexity metrics across various context lengths
  • Maintains competitive performance on standard benchmarks (ARC-c, Hellaswag, MMLU)

Frequently Asked Questions

Q: What makes this model unique?

This model combines the powerful Mistral architecture with YaRN extension method for long context processing, while using AWQ quantization for efficient deployment. It achieves impressive perplexity scores across different context lengths (2.19 at 128k) while maintaining a small deployment footprint.

Q: What are the recommended use cases?

The model excels at tasks requiring long context understanding, such as document analysis, extended conversations, and complex text generation. It's particularly suitable for deployment in resource-constrained environments where efficient inference is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.