Senku-70B-Full
Property | Value |
---|---|
Parameter Count | 69B |
License | CC0-1.0 |
Base Model | miqu-1-70b-sf |
Training Type | PEFT/QLoRA |
What is Senku-70B-Full?
Senku-70B-Full is a sophisticated large language model that represents a significant advancement in AI capabilities. Built upon the miqu-1-70b-sf base model, it has been carefully fine-tuned on the SlimOrca dataset using QLoRA techniques. The model notably achieves an impressive 85.09 score on EQ-Bench using the ChatML template, surpassing previous benchmarks.
Implementation Details
The model utilizes a QLoRA adapter implementation with specific technical configurations including a learning rate of 0.0002, gradient accumulation steps of 4, and a sequence length of 8192. It employs 8-bit training optimizations and flash attention for enhanced performance.
- Implements ChatML format for improved interaction
- Uses LoRA rank 32 with alpha 16
- Features specialized attention mechanisms targeting key projection layers
- Trained with cosine learning rate scheduler
Core Capabilities
- EQ-Bench: 85.09 (with ChatML template)
- GSM8k: 71.04
- HellaSwag: 87.88
- MMLU: 75.20
- TruthfulQA: 61.96
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional performance on EQ-Bench, being the first open weight model to surpass GPT-4 on this benchmark. It also implements an optimized ChatML format that addresses previous stopping token issues.
Q: What are the recommended use cases?
Given its strong performance across multiple benchmarks, the model is well-suited for complex reasoning tasks, mathematical problem-solving, and general language understanding applications. The ChatML format makes it particularly effective for conversational AI implementations.