Yi-34B-Chat-GGUF

Maintained By
TheBloke

Yi-34B-Chat-GGUF

PropertyValue
Parameter Count34.4B
Model TypeChat Model (GGUF Format)
LicenseYi License
AuthorTheBloke (Quantized) / 01-ai (Original)

What is Yi-34B-Chat-GGUF?

Yi-34B-Chat-GGUF is a quantized version of the Yi-34B-Chat model, optimized for efficient deployment on both CPU and GPU systems. This model represents a significant achievement in making large language models more accessible, offering various quantization levels from 2-bit to 8-bit to balance performance and resource requirements.

Implementation Details

The model is available in multiple quantization formats ranging from Q2_K (2-bit) to Q8_0 (8-bit), with file sizes varying from 14.56GB to 36.54GB. It uses the ChatML prompt template and supports diverse deployment options through frameworks like llama.cpp, text-generation-webui, and KoboldCpp.

  • Multiple quantization options (Q2_K to Q8_0) for different performance/size tradeoffs
  • Supports context window from base model
  • Compatible with major GGUF-supporting frameworks
  • Optimized for both CPU and GPU inference

Core Capabilities

  • Strong performance on benchmarks (MMLU: 76.3%, CMMLU: 83.7%)
  • Efficient resource utilization through quantization
  • Multilingual support (English/Chinese)
  • Flexible deployment options across different platforms

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its excellent balance of performance and efficiency, offering multiple quantization options while maintaining strong benchmark scores. It's particularly notable for achieving near-original model performance in 4-bit and 8-bit quantized versions.

Q: What are the recommended use cases?

The model is well-suited for chat applications, creative writing, and general language understanding tasks. The different quantization options make it adaptable to various hardware configurations, from resource-constrained environments (using Q2_K) to high-performance systems (using Q8_0).

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.