SuperHOT-13B-8K-No-RLHF-Test
Property | Value |
---|---|
License | MIT |
Author | kaiokendev |
Base Model Size | 13B parameters |
Context Length | 8192 tokens |
What is superhot-13b-8k-no-rlhf-test?
SuperHOT prototype 2 is an advanced NSFW-focused LoRA model built on a 13B parameter base model, featuring extended context length capabilities up to 8K tokens. This version implements custom positional encoding without RLHF, offering improved performance and flexibility.
Implementation Details
The model utilizes a specialized training configuration with dilated RoPE (DoPE) positional encoding, trained on 1200 samples over 3 epochs. The implementation includes custom monkey-patch requirements for proper functionality at extended context lengths.
- Learning rate: 3e-4 with AdamW optimizer
- LoRA rank: 2, Alpha: 8
- Trained modules: q_proj, k_proj, v_proj, o_proj, and all bias
- Position embedding scaling factor: 0.25
Core Capabilities
- Extended context handling up to 8K tokens
- Optimized positional encoding for improved performance
- Available in multiple formats (GGML, CUDA, CUDA 32g)
- Efficient 4-bit quantization support
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its ability to handle extended context lengths up to 8K tokens through custom positional encoding implementation, while maintaining NSFW capabilities without RLHF constraints.
Q: What are the recommended use cases?
This model is specifically designed for NSFW content generation with extended context requirements, particularly useful in scenarios requiring long-form content generation with consistent context maintenance.