Nous-Hermes-Llama2-GGML

Maintained By
TheBloke

Nous-Hermes-Llama2-GGML

PropertyValue
Base ModelLlama 2 13B
LicenseLlama2
FormatGGML (Deprecated)
LanguageEnglish

What is Nous-Hermes-Llama2-GGML?

Nous-Hermes-Llama2-GGML is a quantized version of the Nous-Hermes-Llama2 13B model, optimized for CPU and GPU inference. This model represents a significant achievement in language model development, having been fine-tuned on over 300,000 instructions primarily generated by GPT-4. The model is notable for achieving top benchmark scores across multiple metrics, including leading positions in ARC-c, ARC-e, Hellaswag, and OpenBookQA benchmarks.

Implementation Details

The model is available in multiple quantization formats ranging from 2-bit to 8-bit precision, offering different trade-offs between model size, performance, and resource requirements. The implementation uses the GGML format, though it's worth noting this format is now deprecated in favor of GGUF.

  • Multiple quantization options from 5.74GB (Q2_K) to 13.83GB (Q8_0)
  • Supports CPU inference with GPU acceleration capability
  • Uses Alpaca prompt format for consistency
  • Trained with 4096 sequence length on 8x A100 80GB DGX hardware

Core Capabilities

  • Strong performance in reasoning and comprehension tasks
  • Excellent benchmark scores (70.0 on GPT4All benchmark average)
  • Lower hallucination rate compared to similar models
  • Long-form response generation
  • Versatile task handling from creative writing to technical analysis

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its extensive fine-tuning on high-quality GPT-4 generated data, achieving state-of-the-art performance on multiple benchmarks while maintaining lower hallucination rates. It's particularly notable for its ability to generate detailed, long-form responses.

Q: What are the recommended use cases?

The model is well-suited for a wide range of applications including creative writing, analytical tasks, coding assistance, and general instruction following. It's particularly effective when implemented through interfaces like LM Studio or custom Discord bots for specific use cases.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.