WizardLM-13B-V1.2-GGML

Property	Value
Base Model	Llama 2 13B
License	Llama 2
Papers	WizardLM Paper
MT-Bench Score	7.06
AlpacaEval Score	89.17%

What is WizardLM-13B-V1.2-GGML?

WizardLM-13B-V1.2-GGML is a quantized version of the WizardLM language model, specifically optimized for CPU and GPU inference using the GGML framework. Based on Llama 2 13B, this model has been instruction-tuned to provide detailed, helpful responses while maintaining strong performance across multiple benchmarks.

Implementation Details

The model is available in multiple quantization formats ranging from 2-bit to 8-bit precision, allowing users to balance between model size, performance, and resource requirements. The quantized versions range from 5.51GB (q2_K) to 13.83GB (q8_0) in size.

Multiple quantization options (q2_K through q8_0)
Supports both CPU and GPU inference
Compatible with various frameworks including text-generation-webui and KoboldCpp
Uses Vicuna-style prompt format for conversations

Core Capabilities

Strong performance on MT-Bench (7.06) and AlpacaEval (89.17%)
Multi-turn conversation support
Detailed and polite responses to complex instructions
Context window of 2048 tokens (expandable with RoPE)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimized balance between performance and resource usage through GGML quantization, while maintaining impressive benchmark scores comparable to much larger models.

Q: What are the recommended use cases?

The model is well-suited for conversational AI applications, instruction-following tasks, and general language understanding. It's particularly useful for users needing local deployment with limited computational resources.