Llama-3.1-8B-Stheno-v3.4
Property | Value |
---|---|
Parameter Count | 8.03B |
Model Type | Language Model |
License | CC-BY-NC-4.0 |
Tensor Type | BF16 |
What is Llama-3.1-8B-Stheno-v3.4?
Llama-3.1-8B-Stheno-v3.4 is an advanced language model built on the LLaMA-3.1 architecture, specifically designed for creative writing and conversational tasks. This model underwent a sophisticated multi-stage fine-tuning process, incorporating both conversational-instruct and creative writing datasets to enhance its capabilities.
Implementation Details
The model features a two-stage fine-tuning approach: first on multi-turn conversational-instruct data, followed by creative writing and roleplay datasets combined with creative-based instruct datasets. The training data includes a mixture of human and Claude-generated content, with significant improvements in dataset quality and diversity.
- Multi-turn conversation-based instruct datasets for enhanced coherency
- Improved single-turn instruct with Claude 3.5 Sonnet and Claude 3 Opus responses
- 55% more roleplaying examples based on Gryphe's Charcard RP Sets
- 40% increase in creative writing examples
- Enhanced system prompt adherence and reasoning capabilities
Core Capabilities
- Advanced creative writing and storytelling
- Improved multi-turn conversation coherency
- Enhanced reasoning and spatial awareness
- Robust system prompt adherence
- Specialized roleplay capabilities
Frequently Asked Questions
Q: What makes this model unique?
This model stands out due to its specialized multi-stage fine-tuning process and its focus on creative writing and conversational abilities. The incorporation of carefully curated datasets and extensive filtering makes it particularly effective for creative and interactive applications.
Q: What are the recommended use cases?
The model is best suited for creative writing, roleplay scenarios, and multi-turn conversations. For optimal results, use the L3 Instruct Formatting with Euryale 2.1 Preset, and recommended settings of 1.4 Temperature and 0.2 min_p.