CausalLM-14B-GGUF

Property	Value
Parameter Count	14.2B
License	WTFPL
Format	GGUF (Various Quantizations)
Languages	English, Chinese

What is CausalLM-14B-GGUF?

CausalLM-14B-GGUF is a powerful large language model that has been optimized and quantized for efficient deployment. Based on the architecture of LLaMA2, this model demonstrates exceptional performance across various benchmarks, notably achieving 67.36% accuracy on MMLU and 73.10% on CEval, outperforming many larger models.

Implementation Details

The model utilizes the ChatML prompt format and comes in multiple GGUF quantizations ranging from 2-bit to 8-bit precision. It was trained on a curated dataset of 1.3B tokens, incorporating synthetic data and carefully selected entries from various sources including Wikipedia, Fandom, and Moegirlpedia.

Multiple quantization options (Q4_0 through Q8_0) for different size/performance trade-offs
Supports context length of 4096 tokens
Implements efficient attention calculation method from original LLaMA2

Core Capabilities

Strong performance on mathematical reasoning (70.12% on GSM8K)
Exceptional multilingual abilities (English and Chinese)
88.26% win rate on AlpacaEval Leaderboard
Optimized for both CPU and GPU inference

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional performance-to-size ratio, outperforming all models under 70B parameters in most quantitative evaluations. It's also optimized for both CPU and GPU deployment through GGUF format.

Q: What are the recommended use cases?

The model excels in both academic and general-purpose tasks, making it suitable for mathematical reasoning, multilingual applications, and general text generation. It's particularly effective for deployments requiring a balance of performance and efficiency.