CausalLM-14B-GGUF
Property | Value |
---|---|
Parameter Count | 14.2B |
License | WTFPL |
Format | GGUF (Various Quantizations) |
Languages | English, Chinese |
What is CausalLM-14B-GGUF?
CausalLM-14B-GGUF is a powerful large language model that has been optimized and quantized for efficient deployment. Based on the architecture of LLaMA2, this model demonstrates exceptional performance across various benchmarks, notably achieving 67.36% accuracy on MMLU and 73.10% on CEval, outperforming many larger models.
Implementation Details
The model utilizes the ChatML prompt format and comes in multiple GGUF quantizations ranging from 2-bit to 8-bit precision. It was trained on a curated dataset of 1.3B tokens, incorporating synthetic data and carefully selected entries from various sources including Wikipedia, Fandom, and Moegirlpedia.
- Multiple quantization options (Q4_0 through Q8_0) for different size/performance trade-offs
- Supports context length of 4096 tokens
- Implements efficient attention calculation method from original LLaMA2
Core Capabilities
- Strong performance on mathematical reasoning (70.12% on GSM8K)
- Exceptional multilingual abilities (English and Chinese)
- 88.26% win rate on AlpacaEval Leaderboard
- Optimized for both CPU and GPU inference
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional performance-to-size ratio, outperforming all models under 70B parameters in most quantitative evaluations. It's also optimized for both CPU and GPU deployment through GGUF format.
Q: What are the recommended use cases?
The model excels in both academic and general-purpose tasks, making it suitable for mathematical reasoning, multilingual applications, and general text generation. It's particularly effective for deployments requiring a balance of performance and efficiency.