Pygmalion-13B-SuperHOT-8K-GPTQ
Property | Value |
---|---|
Base Model Size | 13B parameters |
Quantization | 4-bit GPTQ |
Context Length | 8192 tokens |
Model Hub | Hugging Face |
Author | TheBloke |
What is Pygmalion-13B-SuperHOT-8K-GPTQ?
This model represents a significant advancement in conversational AI, combining PygmalionAI's dialogue-focused model with SuperHOT's extended context capabilities. It's a 4-bit quantized version that maintains high performance while reducing computational requirements, featuring an impressive 8K context window.
Implementation Details
The model uses GPTQ quantization with a group size of 128 for optimal accuracy-performance balance. It's specifically designed to work with ExLlama and AutoGPTQ backends, supporting variable context lengths of 4096 or 8192 tokens through compress_pos_emb settings.
- 4-bit quantization with 128 group size
- Supports up to 8K context length
- Optimized for dialogue generation
- Compatible with text-generation-webui
Core Capabilities
- Enhanced conversational abilities from Pygmalion base model
- Extended context handling up to 8K tokens
- Efficient memory usage through 4-bit quantization
- Persona-based dialogue generation
- Support for both CPU and GPU inference
Frequently Asked Questions
Q: What makes this model unique?
This model uniquely combines Pygmalion's conversational capabilities with SuperHOT's extended context handling, all while maintaining efficiency through 4-bit quantization. The 8K context window is particularly notable, allowing for longer, more coherent conversations.
Q: What are the recommended use cases?
The model is specifically designed for fictional conversation and entertainment purposes. It excels in character-based dialogue generation and can maintain context over longer conversations thanks to its extended context window. However, it's important to note it's not fine-tuned for factual accuracy or safety-critical applications.