OpenHermes-13B
Property | Value |
---|---|
Base Model | LLaMA-2 13B |
License | MIT |
Training Data | 242,000 entries |
Language | English |
What is OpenHermes-13B?
OpenHermes-13B is an advanced language model built on the LLaMA-2-13B architecture, fine-tuned on a comprehensive dataset of 242,000 high-quality GPT-4 generated examples. It represents a significant milestone as the first Hermes model with a fully open-source dataset, making it particularly valuable for both research and practical applications.
Implementation Details
The model was trained using a carefully curated combination of datasets including GPTeacher, WizardLM, Airoboros GPT-4, Camel-AI's domain expert datasets, CodeAlpaca, and others. Training utilized a learning rate of 2e-05 with the Adam optimizer, implementing a cosine learning rate scheduler over 3 epochs.
- Multi-GPU training across 8 devices
- Gradient accumulation steps: 8
- Total batch size: 128
- Warmup steps: 300
Core Capabilities
- Strong performance on GPT4All benchmark (70.36% average)
- Improved reasoning capabilities shown in BigBench tests
- Effective handling of complex instructions and domain-specific tasks
- Code generation and analysis capabilities
Frequently Asked Questions
Q: What makes this model unique?
OpenHermes-13B stands out for its fully open-source training dataset and impressive benchmark performance, particularly in reasoning tasks. It shows improvements over previous versions in GPT4All and BigBench suites, making it particularly suitable for general-purpose applications.
Q: What are the recommended use cases?
The model excels in instruction following, reasoning tasks, and code-related applications. It's particularly well-suited for applications requiring strong reasoning capabilities and general-purpose text generation while maintaining ethical boundaries through careful dataset curation.