Pygmalion-13B-SuperHOT-8K-GPTQ

Property	Value
Base Model Size	13B parameters
Quantization	4-bit GPTQ
Context Length	8192 tokens
Model Hub	Hugging Face
Author	TheBloke

What is Pygmalion-13B-SuperHOT-8K-GPTQ?

This model represents a significant advancement in conversational AI, combining PygmalionAI's dialogue-focused model with SuperHOT's extended context capabilities. It's a 4-bit quantized version that maintains high performance while reducing computational requirements, featuring an impressive 8K context window.

Implementation Details

The model uses GPTQ quantization with a group size of 128 for optimal accuracy-performance balance. It's specifically designed to work with ExLlama and AutoGPTQ backends, supporting variable context lengths of 4096 or 8192 tokens through compress_pos_emb settings.

4-bit quantization with 128 group size
Supports up to 8K context length
Optimized for dialogue generation
Compatible with text-generation-webui

Core Capabilities

Enhanced conversational abilities from Pygmalion base model
Extended context handling up to 8K tokens
Efficient memory usage through 4-bit quantization
Persona-based dialogue generation
Support for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines Pygmalion's conversational capabilities with SuperHOT's extended context handling, all while maintaining efficiency through 4-bit quantization. The 8K context window is particularly notable, allowing for longer, more coherent conversations.

Q: What are the recommended use cases?

The model is specifically designed for fictional conversation and entertainment purposes. It excels in character-based dialogue generation and can maintain context over longer conversations thanks to its extended context window. However, it's important to note it's not fine-tuned for factual accuracy or safety-critical applications.