Nous-Hermes-Llama2-GGML

Property	Value
Base Model	Llama 2 13B
License	Llama2
Format	GGML (Deprecated)
Language	English

What is Nous-Hermes-Llama2-GGML?

Nous-Hermes-Llama2-GGML is a quantized version of the Nous-Hermes-Llama2 13B model, optimized for CPU and GPU inference. This model represents a significant achievement in language model development, having been fine-tuned on over 300,000 instructions primarily generated by GPT-4. The model is notable for achieving top benchmark scores across multiple metrics, including leading positions in ARC-c, ARC-e, Hellaswag, and OpenBookQA benchmarks.

Implementation Details

The model is available in multiple quantization formats ranging from 2-bit to 8-bit precision, offering different trade-offs between model size, performance, and resource requirements. The implementation uses the GGML format, though it's worth noting this format is now deprecated in favor of GGUF.

Multiple quantization options from 5.74GB (Q2_K) to 13.83GB (Q8_0)
Supports CPU inference with GPU acceleration capability
Uses Alpaca prompt format for consistency
Trained with 4096 sequence length on 8x A100 80GB DGX hardware

Core Capabilities

Strong performance in reasoning and comprehension tasks
Excellent benchmark scores (70.0 on GPT4All benchmark average)
Lower hallucination rate compared to similar models
Long-form response generation
Versatile task handling from creative writing to technical analysis

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its extensive fine-tuning on high-quality GPT-4 generated data, achieving state-of-the-art performance on multiple benchmarks while maintaining lower hallucination rates. It's particularly notable for its ability to generate detailed, long-form responses.

Q: What are the recommended use cases?

The model is well-suited for a wide range of applications including creative writing, analytical tasks, coding assistance, and general instruction following. It's particularly effective when implemented through interfaces like LM Studio or custom Discord bots for specific use cases.