Llama-2-13B-chat-GGUF

Maintained By
TheBloke

Llama-2-13B-chat-GGUF

PropertyValue
Base ModelMeta's Llama 2 13B
Parameter Count13 Billion
Context Length4096 tokens
LicenseMeta Custom Commercial License
Training Tokens2.0T

What is Llama-2-13B-chat-GGUF?

Llama-2-13B-chat-GGUF is a converted and optimized version of Meta's Llama 2 13B chat model, specifically formatted in the GGUF format for efficient deployment and inference. This model represents a significant advancement in accessible AI, offering multiple quantization options from 2-bit to 8-bit to balance performance and resource requirements.

Implementation Details

The model is available in various quantization levels, with file sizes ranging from 5.43GB (Q2_K) to 13.83GB (Q8_0). It supports GPU acceleration across multiple platforms including CUDA, ROCm, and Metal, and can be implemented through various frameworks like llama.cpp, text-generation-webui, and KoboldCpp.

  • Multiple quantization options (Q2_K through Q8_0)
  • Supports GPU layer offloading for improved performance
  • Compatible with major frameworks and libraries
  • Includes built-in chat formatting and system prompts

Core Capabilities

  • Optimized for dialogue and chat applications
  • Supports context length of 4096 tokens
  • Includes safety-optimized responses and filtering
  • Performs well on academic benchmarks including code, reasoning, and world knowledge

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient GGUF format implementation, offering multiple quantization options that make it accessible for various hardware configurations while maintaining performance. It's specifically optimized for dialogue use cases and includes built-in safety measures.

Q: What are the recommended use cases?

The model is best suited for assistant-like chat applications, dialogue systems, and general natural language generation tasks in English. The recommended quantization for most users is Q4_K_M, which offers a good balance between model size and performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.