WizardLM-13B-V1.2-GGML
Property | Value |
---|---|
Base Model | Llama 2 13B |
License | Llama 2 |
Papers | WizardLM Paper |
MT-Bench Score | 7.06 |
AlpacaEval Score | 89.17% |
What is WizardLM-13B-V1.2-GGML?
WizardLM-13B-V1.2-GGML is a quantized version of the WizardLM language model, specifically optimized for CPU and GPU inference using the GGML framework. Based on Llama 2 13B, this model has been instruction-tuned to provide detailed, helpful responses while maintaining strong performance across multiple benchmarks.
Implementation Details
The model is available in multiple quantization formats ranging from 2-bit to 8-bit precision, allowing users to balance between model size, performance, and resource requirements. The quantized versions range from 5.51GB (q2_K) to 13.83GB (q8_0) in size.
- Multiple quantization options (q2_K through q8_0)
- Supports both CPU and GPU inference
- Compatible with various frameworks including text-generation-webui and KoboldCpp
- Uses Vicuna-style prompt format for conversations
Core Capabilities
- Strong performance on MT-Bench (7.06) and AlpacaEval (89.17%)
- Multi-turn conversation support
- Detailed and polite responses to complex instructions
- Context window of 2048 tokens (expandable with RoPE)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its optimized balance between performance and resource usage through GGML quantization, while maintaining impressive benchmark scores comparable to much larger models.
Q: What are the recommended use cases?
The model is well-suited for conversational AI applications, instruction-following tasks, and general language understanding. It's particularly useful for users needing local deployment with limited computational resources.