Pygmalion-13B-4bit-128g
Property | Value |
---|---|
Base Model | Pygmalion-13B |
Quantization | 4-bit GPTQ |
Group Size | 128 |
Format | SafeTensor |
Source | Hugging Face |
What is pygmalion-13b-4bit-128g?
Pygmalion-13B-4bit-128g is a quantized version of the original Pygmalion-13B model, optimized for efficient deployment while maintaining performance. This model uses 4-bit precision and 128-group size quantization through GPTQ CUDA, significantly reducing the model's memory footprint while preserving its core capabilities. Notable for its adult content generation capabilities, this model comes with explicit content warnings and age restrictions.
Implementation Details
The model was quantized using the GPTQ-for-LLaMa framework with specific optimizations including true-sequential processing and 128 group size parameters. It's stored in the safetensor format, providing additional security and efficiency benefits.
- GPTQ CUDA quantization implementation
- True-sequential processing enabled
- 128 group size optimization
- SafeTensor format storage
Core Capabilities
- Efficient memory usage through 4-bit quantization
- Adult content generation capabilities
- Maintains base model performance with reduced size
- Compatible with standard LLaMA-based frameworks
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient 4-bit quantization while maintaining the capabilities of the full Pygmalion-13B model. The 128 group size optimization provides a balance between compression and performance.
Q: What are the recommended use cases?
The model is specifically designed for adult content generation and should only be used by adults in appropriate contexts. It's particularly suitable for deployment in environments where memory efficiency is crucial while maintaining generation quality.