Llama-2-13B-chat-GGML

Maintained By
TheBloke

Llama-2-13B-chat-GGML

PropertyValue
Parameter Count13 Billion
Model TypeChat-optimized Language Model
ArchitectureLlama 2
LicenseMeta Custom License
Research PaperLlama 2 Paper

What is Llama-2-13B-chat-GGML?

Llama-2-13B-chat-GGML is a converted version of Meta's Llama 2 13B chat model, optimized for CPU and GPU inference using the GGML format. This model represents a middle ground in the Llama 2 family, offering a balance between performance and resource requirements. It's specifically fine-tuned for dialogue applications and comes with multiple quantization options to suit different hardware configurations.

Implementation Details

The model is available in various quantization levels, from 2-bit to 8-bit, allowing users to balance between model size and performance. For example, the q4_K_M variant offers an excellent compromise at 7.87GB model size with 10.37GB RAM requirement. The model implements the latest k-quant methods for optimal performance.

  • Context length: 4096 tokens standard (expandable with RoPE scaling)
  • Multiple quantization options (q2_K through q8_0)
  • Supports GPU layer offloading for improved performance
  • Compatible with llama.cpp and various UI implementations

Core Capabilities

  • Optimized for dialogue and chat applications
  • Strong performance in helpfulness and safety benchmarks
  • Scores 54.8 on MMLU (13B version)
  • Enhanced truthfulness with 62.18% score on TruthfulQA
  • Zero toxicity rating in safety evaluations

Frequently Asked Questions

Q: What makes this model unique?

This GGML version allows efficient CPU/GPU inference with various quantization options, making it accessible for consumer hardware while maintaining the quality of the original Llama 2 model.

Q: What are the recommended use cases?

The model excels in assistant-like chat applications, text generation, and general dialogue tasks. It's particularly suitable for deployment in scenarios where a balance between performance and resource usage is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.