MS3-Tantum-24B-v0.1
Property | Value |
---|---|
Parameter Count | 24B |
Model Type | Merged Language Model |
Architecture | Mistral-based |
Author | Nohobby |
Model URL | Hugging Face |
What is MS3-Tantum-24B-v0.1?
MS3-Tantum-24B-v0.1 is a sophisticated merged language model that combines multiple 24B parameter models to create a versatile chat and roleplay assistant. The model is built upon the Mistral architecture and incorporates various specialized components including RP-Whole (RP-Broth) and multiple instruction-tuned models.
Implementation Details
The model is created through a complex merging process involving multiple stages and specialized merging techniques including SCE (Sparse Cross Entropy) and della_linear methods. It utilizes careful weight distributions across different projection layers (v_proj, o_proj, up_proj, gate_proj, down_proj) and implements sophisticated density and weight parameters for optimal performance.
- Employs bfloat16 precision for efficient computation
- Utilizes multiple merge stages including MS3-test-Merge-1, RP-half1, RP-half2, and final Tantum stages
- Incorporates specialized components for roleplay and chat capabilities
- Features thinking capabilities through
tags - Supports multiple chat formats including ChatML and Llama3
Core Capabilities
- Advanced roleplay and character adherence
- Structured thinking through
tag implementation - Multi-format chat support
- Improved prose generation
- Enhanced instruction following abilities
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its specialized merge architecture that combines roleplay capabilities with thinking abilities, while maintaining strong general chat performance. The careful weight distribution across different projection layers and multiple merge stages creates a balanced model suitable for various use cases.
Q: What are the recommended use cases?
The model is particularly well-suited for: 1) Roleplay scenarios requiring strong character adherence, 2) Chat applications needing structured thinking capabilities, 3) General conversation with improved prose generation, 4) Applications requiring support for multiple chat formats including ChatML and Llama3.