Goliath-120B-GGUF
Property | Value |
---|---|
Parameter Count | 118B parameters |
Model Type | LLaMA Architecture |
License | LLaMA 2 |
Author | TheBloke (Quantized) / Alpin (Original) |
What is goliath-120b-GGUF?
Goliath-120B-GGUF is a powerful large language model that represents a significant advancement in efficient AI deployment. Created through a sophisticated merge of two fine-tuned Llama-2 70B models (Xwin and Euryale), this model has been converted to the GGUF format by TheBloke to enable efficient deployment across various computing environments.
Implementation Details
The model features multiple quantization options ranging from 2-bit to 8-bit precision, allowing users to balance between model size and performance. The implementation includes innovative layer arrangements, with carefully orchestrated combinations of Xwin and Euryale model layers to optimize performance.
- Multiple quantization options (Q2_K through Q8_0)
- File sizes ranging from 49.63GB to 125.12GB
- Compatible with latest llama.cpp and various UI platforms
- Supports GPU acceleration with layer offloading
Core Capabilities
- Advanced text generation and conversation
- Supports both Vicuna and Alpaca prompting formats
- Context window handling up to 4096 tokens
- Efficient deployment options for various hardware configurations
Frequently Asked Questions
Q: What makes this model unique?
The model's unique architecture combining two 70B models, along with its various quantization options, makes it highly versatile for different deployment scenarios while maintaining high performance.
Q: What are the recommended use cases?
The model is well-suited for conversational AI applications, text generation, and general language understanding tasks. The Q4_K_M quantization is recommended for balanced performance, while Q5_K_S/M versions are ideal for scenarios requiring higher accuracy.