Yi-34B-Chat-GGUF
Property | Value |
---|---|
Parameter Count | 34.4B |
Model Type | Chat Model (GGUF Format) |
License | Yi License |
Author | TheBloke (Quantized) / 01-ai (Original) |
What is Yi-34B-Chat-GGUF?
Yi-34B-Chat-GGUF is a quantized version of the Yi-34B-Chat model, optimized for efficient deployment on both CPU and GPU systems. This model represents a significant achievement in making large language models more accessible, offering various quantization levels from 2-bit to 8-bit to balance performance and resource requirements.
Implementation Details
The model is available in multiple quantization formats ranging from Q2_K (2-bit) to Q8_0 (8-bit), with file sizes varying from 14.56GB to 36.54GB. It uses the ChatML prompt template and supports diverse deployment options through frameworks like llama.cpp, text-generation-webui, and KoboldCpp.
- Multiple quantization options (Q2_K to Q8_0) for different performance/size tradeoffs
- Supports context window from base model
- Compatible with major GGUF-supporting frameworks
- Optimized for both CPU and GPU inference
Core Capabilities
- Strong performance on benchmarks (MMLU: 76.3%, CMMLU: 83.7%)
- Efficient resource utilization through quantization
- Multilingual support (English/Chinese)
- Flexible deployment options across different platforms
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its excellent balance of performance and efficiency, offering multiple quantization options while maintaining strong benchmark scores. It's particularly notable for achieving near-original model performance in 4-bit and 8-bit quantized versions.
Q: What are the recommended use cases?
The model is well-suited for chat applications, creative writing, and general language understanding tasks. The different quantization options make it adaptable to various hardware configurations, from resource-constrained environments (using Q2_K) to high-performance systems (using Q8_0).