Llama-2-13B-GGML

Maintained By
TheBloke

Llama-2-13B-GGML

PropertyValue
Base ModelMeta Llama-2 13B
LicenseLlama2
PaperarXiv:2307.09288
FormatGGML (Deprecated)

What is Llama-2-13B-GGML?

Llama-2-13B-GGML is a quantized version of Meta's Llama 2 13B parameter language model, specifically optimized for CPU and GPU inference using the GGML format. This model represents a crucial adaptation of the original Llama 2 architecture, offering various quantization levels from 2-bit to 8-bit precision to balance performance and resource requirements.

Implementation Details

The model comes in multiple quantization variants, ranging from 5.51GB to 13.83GB in size. It utilizes advanced k-quant methods for efficient compression while maintaining performance. The implementation supports GPU offloading and is compatible with multiple frameworks including text-generation-webui, KoboldCpp, and LM Studio.

  • Multiple quantization options (Q2_K through Q8_0)
  • Compatible with various UI frameworks and libraries
  • Support for GPU acceleration
  • Context length of up to 4096 tokens

Core Capabilities

  • Text generation and completion tasks
  • Efficient CPU+GPU inference
  • Flexible deployment options across different hardware configurations
  • Support for multiple quantization levels to balance performance and resource usage

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in offering multiple quantization options using the GGML format, allowing users to choose the optimal balance between model size, performance, and resource usage. It ranges from highly compressed 2-bit versions to high-fidelity 8-bit versions.

Q: What are the recommended use cases?

The model is best suited for general text generation tasks where efficient local deployment is required. It's particularly useful for scenarios where GPU resources are limited or when CPU-based inference is preferred. The various quantization options allow users to choose the best version for their specific hardware constraints.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.