pygmalion-13b-4bit-128g

Maintained By
notstoic

Pygmalion-13B-4bit-128g

PropertyValue
Base ModelPygmalion-13B
Quantization4-bit GPTQ
Group Size128
FormatSafeTensor
SourceHugging Face

What is pygmalion-13b-4bit-128g?

Pygmalion-13B-4bit-128g is a quantized version of the original Pygmalion-13B model, optimized for efficient deployment while maintaining performance. This model uses 4-bit precision and 128-group size quantization through GPTQ CUDA, significantly reducing the model's memory footprint while preserving its core capabilities. Notable for its adult content generation capabilities, this model comes with explicit content warnings and age restrictions.

Implementation Details

The model was quantized using the GPTQ-for-LLaMa framework with specific optimizations including true-sequential processing and 128 group size parameters. It's stored in the safetensor format, providing additional security and efficiency benefits.

  • GPTQ CUDA quantization implementation
  • True-sequential processing enabled
  • 128 group size optimization
  • SafeTensor format storage

Core Capabilities

  • Efficient memory usage through 4-bit quantization
  • Adult content generation capabilities
  • Maintains base model performance with reduced size
  • Compatible with standard LLaMA-based frameworks

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient 4-bit quantization while maintaining the capabilities of the full Pygmalion-13B model. The 128 group size optimization provides a balance between compression and performance.

Q: What are the recommended use cases?

The model is specifically designed for adult content generation and should only be used by adults in appropriate contexts. It's particularly suitable for deployment in environments where memory efficiency is crucial while maintaining generation quality.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.