Pygmalion-3-12B
Property | Value |
---|---|
Base Model | Mistral Nemo |
Parameters | 12 Billion |
License | Apache 2.0 |
Training Hardware | 8x NVIDIA A40 GPUs |
Model URL | huggingface.co/PygmalionAI/Pygmalion-3-12B |
What is Pygmalion-3-12B?
Pygmalion-3-12B is an advanced roleplaying AI model developed by PygmalionAI, built upon Mistral's Nemo base model. It represents a significant advancement in open-source roleplaying capabilities, trained on hundreds of millions of tokens of conversations and creative writing. The model is specifically designed for immersive character interactions and creative writing scenarios.
Implementation Details
The model was trained as a rank-32 LoRA adapter using DeepSpeed ZeRO optimization, with a learning rate of 2e-4 and a total batch size of 24 across all GPUs. It implements the standard ChatML format for input/output handling and includes special "Enter X mode" functionality carried over from its predecessor, Pygmalion-2.
- Trained using the PIPPA dataset and roleplaying forums
- Employs cosine learning rate scheduler with 100-step warmup
- Supports detailed character interactions and creative writing
- Uses ChatML format for standardized prompting
Core Capabilities
- Advanced character roleplay and interaction
- Detailed and immersive response generation
- Creative writing and storytelling
- Flexible system prompt customization
- Character personality maintenance
Frequently Asked Questions
Q: What makes this model unique?
Pygmalion-3-12B stands out for its specialized focus on roleplaying and creative writing, built with a permissive license that allows for community development and modification. Its training on extensive roleplay data and implementation of ChatML format makes it particularly suited for character-based interactions.
Q: What are the recommended use cases?
The model is specifically designed for fictional writing and entertainment purposes. It excels in roleplaying scenarios, character-based interactions, and creative writing applications. It's important to note that it's not intended for factual or safety-critical applications.