Llama-2-13B-chat-GGUF

Property	Value
Base Model	Meta's Llama 2 13B
Parameter Count	13 Billion
Context Length	4096 tokens
License	Meta Custom Commercial License
Training Tokens	2.0T

What is Llama-2-13B-chat-GGUF?

Llama-2-13B-chat-GGUF is a converted and optimized version of Meta's Llama 2 13B chat model, specifically formatted in the GGUF format for efficient deployment and inference. This model represents a significant advancement in accessible AI, offering multiple quantization options from 2-bit to 8-bit to balance performance and resource requirements.

Implementation Details

The model is available in various quantization levels, with file sizes ranging from 5.43GB (Q2_K) to 13.83GB (Q8_0). It supports GPU acceleration across multiple platforms including CUDA, ROCm, and Metal, and can be implemented through various frameworks like llama.cpp, text-generation-webui, and KoboldCpp.

Multiple quantization options (Q2_K through Q8_0)
Supports GPU layer offloading for improved performance
Compatible with major frameworks and libraries
Includes built-in chat formatting and system prompts

Core Capabilities

Optimized for dialogue and chat applications
Supports context length of 4096 tokens
Includes safety-optimized responses and filtering
Performs well on academic benchmarks including code, reasoning, and world knowledge

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient GGUF format implementation, offering multiple quantization options that make it accessible for various hardware configurations while maintaining performance. It's specifically optimized for dialogue use cases and includes built-in safety measures.

Q: What are the recommended use cases?

The model is best suited for assistant-like chat applications, dialogue systems, and general natural language generation tasks in English. The recommended quantization for most users is Q4_K_M, which offers a good balance between model size and performance.