Nous-Hermes-Llama2-GGML
Property | Value |
---|---|
Base Model | Llama 2 13B |
License | Llama2 |
Format | GGML (Deprecated) |
Language | English |
What is Nous-Hermes-Llama2-GGML?
Nous-Hermes-Llama2-GGML is a quantized version of the Nous-Hermes-Llama2 13B model, optimized for CPU and GPU inference. This model represents a significant achievement in language model development, having been fine-tuned on over 300,000 instructions primarily generated by GPT-4. The model is notable for achieving top benchmark scores across multiple metrics, including leading positions in ARC-c, ARC-e, Hellaswag, and OpenBookQA benchmarks.
Implementation Details
The model is available in multiple quantization formats ranging from 2-bit to 8-bit precision, offering different trade-offs between model size, performance, and resource requirements. The implementation uses the GGML format, though it's worth noting this format is now deprecated in favor of GGUF.
- Multiple quantization options from 5.74GB (Q2_K) to 13.83GB (Q8_0)
- Supports CPU inference with GPU acceleration capability
- Uses Alpaca prompt format for consistency
- Trained with 4096 sequence length on 8x A100 80GB DGX hardware
Core Capabilities
- Strong performance in reasoning and comprehension tasks
- Excellent benchmark scores (70.0 on GPT4All benchmark average)
- Lower hallucination rate compared to similar models
- Long-form response generation
- Versatile task handling from creative writing to technical analysis
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its extensive fine-tuning on high-quality GPT-4 generated data, achieving state-of-the-art performance on multiple benchmarks while maintaining lower hallucination rates. It's particularly notable for its ability to generate detailed, long-form responses.
Q: What are the recommended use cases?
The model is well-suited for a wide range of applications including creative writing, analytical tasks, coding assistance, and general instruction following. It's particularly effective when implemented through interfaces like LM Studio or custom Discord bots for specific use cases.