L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix

Property	Value
Parameter Count	8.03B
License	CC-BY-NC-4.0
Architecture	LLaMA3-based
Quantization	GGUF-IQ-Imatrix

What is L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix?

L3-8B-Stheno-v3.2-GGUF-IQ-Imatrix is a sophisticated quantized variant of the Stheno-v3.2 model, built on the LLaMA3 architecture. This version represents an evolution from v3.1, incorporating both SFW and NSFW storywriting capabilities while maintaining strong performance in roleplay and conversational tasks.

Implementation Details

The model utilizes advanced quantization techniques, specifically optimized for 8GB VRAM GPUs using Q4_K_M-imat quantization achieving 4.89 BPW efficiency. The implementation includes refined hyperparameters and carefully curated training data from multiple sources.

Integrated mixture of SFW/NSFW storywriting data from Gryphe's Opus-WritingPrompts
Enhanced instruction/assistant-style data integration
Improved roleplaying sample quality through manual filtering
Optimized hyperparameters for reduced loss levels

Core Capabilities

Balanced handling of SFW/NSFW content
Enhanced storywriting and narration abilities
Improved multi-turn coherency
Better prompt and instruction adherence
Support for up to 12288 context size

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its balanced approach to content generation, improved multi-turn coherency, and optimized performance on 8GB VRAM GPUs. The implementation of GGUF-IQ-Imatrix quantization makes it particularly efficient for resource-constrained environments.

Q: What are the recommended use cases?

The model excels in roleplay scenarios, creative writing, and conversational tasks. It's particularly well-suited for SillyTavern implementations and general narrative generation, with recommended temperature settings between 1.12-1.22 and Min-P of 0.075.