Senku-70B-Full

Property	Value
Parameter Count	69B
License	CC0-1.0
Base Model	miqu-1-70b-sf
Training Type	PEFT/QLoRA

What is Senku-70B-Full?

Senku-70B-Full is a sophisticated large language model that represents a significant advancement in AI capabilities. Built upon the miqu-1-70b-sf base model, it has been carefully fine-tuned on the SlimOrca dataset using QLoRA techniques. The model notably achieves an impressive 85.09 score on EQ-Bench using the ChatML template, surpassing previous benchmarks.

Implementation Details

The model utilizes a QLoRA adapter implementation with specific technical configurations including a learning rate of 0.0002, gradient accumulation steps of 4, and a sequence length of 8192. It employs 8-bit training optimizations and flash attention for enhanced performance.

Implements ChatML format for improved interaction
Uses LoRA rank 32 with alpha 16
Features specialized attention mechanisms targeting key projection layers
Trained with cosine learning rate scheduler

Core Capabilities

EQ-Bench: 85.09 (with ChatML template)
GSM8k: 71.04
HellaSwag: 87.88
MMLU: 75.20
TruthfulQA: 61.96

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional performance on EQ-Bench, being the first open weight model to surpass GPT-4 on this benchmark. It also implements an optimized ChatML format that addresses previous stopping token issues.

Q: What are the recommended use cases?

Given its strong performance across multiple benchmarks, the model is well-suited for complex reasoning tasks, mathematical problem-solving, and general language understanding applications. The ChatML format makes it particularly effective for conversational AI implementations.

Senku-70B-Full

Senku-70B-Full

What is Senku-70B-Full?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models