Senku-70B-Full

Maintained By
ShinojiResearch

Senku-70B-Full

PropertyValue
Parameter Count69B
LicenseCC0-1.0
Base Modelmiqu-1-70b-sf
Training TypePEFT/QLoRA

What is Senku-70B-Full?

Senku-70B-Full is a sophisticated large language model that represents a significant advancement in AI capabilities. Built upon the miqu-1-70b-sf base model, it has been carefully fine-tuned on the SlimOrca dataset using QLoRA techniques. The model notably achieves an impressive 85.09 score on EQ-Bench using the ChatML template, surpassing previous benchmarks.

Implementation Details

The model utilizes a QLoRA adapter implementation with specific technical configurations including a learning rate of 0.0002, gradient accumulation steps of 4, and a sequence length of 8192. It employs 8-bit training optimizations and flash attention for enhanced performance.

  • Implements ChatML format for improved interaction
  • Uses LoRA rank 32 with alpha 16
  • Features specialized attention mechanisms targeting key projection layers
  • Trained with cosine learning rate scheduler

Core Capabilities

  • EQ-Bench: 85.09 (with ChatML template)
  • GSM8k: 71.04
  • HellaSwag: 87.88
  • MMLU: 75.20
  • TruthfulQA: 61.96

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional performance on EQ-Bench, being the first open weight model to surpass GPT-4 on this benchmark. It also implements an optimized ChatML format that addresses previous stopping token issues.

Q: What are the recommended use cases?

Given its strong performance across multiple benchmarks, the model is well-suited for complex reasoning tasks, mathematical problem-solving, and general language understanding applications. The ChatML format makes it particularly effective for conversational AI implementations.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.