MN-Slush

Property	Value
Base Model	Mistral-Nemo-Instruct-2407
Training Method	Two-stage LoRA with TIES merge
Context Size	16384 tokens
Recommended Settings	Temperature: 1.0, Min-P: 0.1, DRY: 0.8

What is MN-Slush?

MN-Slush is an advanced language model that implements a novel two-stage training approach, specifically designed to enhance creative writing and roleplaying capabilities. Built upon the Mistral-Nemo-Instruct-2407 architecture, it utilizes high LoRA dropout training techniques to improve model generalization and creativity.

Implementation Details

The model employs a sophisticated training pipeline consisting of two distinct stages. Stage 1 focuses on pretraining continuation with high LoRA dropout (0.5), utilizing LoRA+ technology with a rank of 64 and alpha of 128. Stage 2 implements fine-tuning with modified parameters (rank 32, alpha 64) to enhance roleplaying capabilities while maintaining model stability.

Implements TIES merge method for optimal model combination
Uses bfloat16 precision for efficient computation
Trained on 6 specialized datasets for comprehensive language understanding
Features 16384 token context window

Core Capabilities

Enhanced creative writing and storytelling
Advanced roleplaying interactions
Improved text generation consistency
High-context understanding and maintenance

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive two-stage training approach with high LoRA dropout rates, combined with the TIES merge method, creates a unique balance between creativity and coherence. The implementation of LoRA+ technology with specific learning rate ratios further enhances its capabilities.

Q: What are the recommended use cases?

MN-Slush is particularly well-suited for creative writing, roleplaying scenarios, and interactive storytelling. It performs optimally with the Silly Tavern preset, specifically designed for Mistral V2 & V3 implementations.

MN-Slush

MN-Slush

What is MN-Slush?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models