guanaco-65B-GGML

Maintained By
TheBloke

Guanaco-65B-GGML

PropertyValue
Base ModelLLaMA 65B
LicenseOther (Apache 2 for adapters)
PaperQLoRA Paper
Quantization Options2-bit to 8-bit GGML

What is guanaco-65B-GGML?

Guanaco-65B-GGML is a quantized version of the Guanaco language model, specifically optimized for CPU and GPU inference using the GGML framework. Based on the LLaMA architecture and fine-tuned using QLoRA on the OASST1 dataset, this model offers various quantization options from 2-bit to 8-bit precision to balance performance and resource usage.

Implementation Details

The model comes in multiple quantization variants, ranging from lightweight 2-bit versions (27.33GB) to full 8-bit precision (69.37GB). It implements both traditional llama.cpp quantization methods (q4_0, q4_1, q5_0, q5_1, q8_0) and newer k-quant methods for optimal performance.

  • Supports multiple quantization levels (q2_K through q8_0)
  • Compatible with llama.cpp and various UI frameworks
  • Includes GPU layer offloading capabilities
  • Requires specific prompt template: "### Human: [prompt] ### Assistant:"

Core Capabilities

  • Competitive performance with commercial chatbot systems
  • Multi-lingual response capabilities
  • Efficient CPU+GPU inference with adjustable resource usage
  • Supports context window of 2048 tokens
  • Performs well on MMLU benchmark (62.2% accuracy)

Frequently Asked Questions

Q: What makes this model unique?

This model offers a rare combination of high performance (65B parameters) with efficient quantization options, making it viable to run on consumer hardware through CPU+GPU inference. It provides multiple quantization levels to balance between quality and resource usage.

Q: What are the recommended use cases?

The model is best suited for research purposes and local deployment of high-quality chatbot capabilities. It's particularly useful when you need a powerful language model that can run on limited hardware resources through quantization.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.