MN-Slush
Property | Value |
---|---|
Base Model | Mistral-Nemo-Instruct-2407 |
Training Method | Two-stage LoRA with TIES merge |
Context Size | 16384 tokens |
Recommended Settings | Temperature: 1.0, Min-P: 0.1, DRY: 0.8 |
What is MN-Slush?
MN-Slush is an advanced language model that implements a novel two-stage training approach, specifically designed to enhance creative writing and roleplaying capabilities. Built upon the Mistral-Nemo-Instruct-2407 architecture, it utilizes high LoRA dropout training techniques to improve model generalization and creativity.
Implementation Details
The model employs a sophisticated training pipeline consisting of two distinct stages. Stage 1 focuses on pretraining continuation with high LoRA dropout (0.5), utilizing LoRA+ technology with a rank of 64 and alpha of 128. Stage 2 implements fine-tuning with modified parameters (rank 32, alpha 64) to enhance roleplaying capabilities while maintaining model stability.
- Implements TIES merge method for optimal model combination
- Uses bfloat16 precision for efficient computation
- Trained on 6 specialized datasets for comprehensive language understanding
- Features 16384 token context window
Core Capabilities
- Enhanced creative writing and storytelling
- Advanced roleplaying interactions
- Improved text generation consistency
- High-context understanding and maintenance
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive two-stage training approach with high LoRA dropout rates, combined with the TIES merge method, creates a unique balance between creativity and coherence. The implementation of LoRA+ technology with specific learning rate ratios further enhances its capabilities.
Q: What are the recommended use cases?
MN-Slush is particularly well-suited for creative writing, roleplaying scenarios, and interactive storytelling. It performs optimally with the Silly Tavern preset, specifically designed for Mistral V2 & V3 implementations.