Llama-2-13B-chat-GGUF
Property | Value |
---|---|
Base Model | Meta's Llama 2 13B |
Parameter Count | 13 Billion |
Context Length | 4096 tokens |
License | Meta Custom Commercial License |
Training Tokens | 2.0T |
What is Llama-2-13B-chat-GGUF?
Llama-2-13B-chat-GGUF is a converted and optimized version of Meta's Llama 2 13B chat model, specifically formatted in the GGUF format for efficient deployment and inference. This model represents a significant advancement in accessible AI, offering multiple quantization options from 2-bit to 8-bit to balance performance and resource requirements.
Implementation Details
The model is available in various quantization levels, with file sizes ranging from 5.43GB (Q2_K) to 13.83GB (Q8_0). It supports GPU acceleration across multiple platforms including CUDA, ROCm, and Metal, and can be implemented through various frameworks like llama.cpp, text-generation-webui, and KoboldCpp.
- Multiple quantization options (Q2_K through Q8_0)
- Supports GPU layer offloading for improved performance
- Compatible with major frameworks and libraries
- Includes built-in chat formatting and system prompts
Core Capabilities
- Optimized for dialogue and chat applications
- Supports context length of 4096 tokens
- Includes safety-optimized responses and filtering
- Performs well on academic benchmarks including code, reasoning, and world knowledge
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient GGUF format implementation, offering multiple quantization options that make it accessible for various hardware configurations while maintaining performance. It's specifically optimized for dialogue use cases and includes built-in safety measures.
Q: What are the recommended use cases?
The model is best suited for assistant-like chat applications, dialogue systems, and general natural language generation tasks in English. The recommended quantization for most users is Q4_K_M, which offers a good balance between model size and performance.