Orca Mini 13B GGML
Property | Value |
---|---|
License | MIT |
Research Paper | Orca Paper |
Author | TheBloke |
Language | English |
What is orca_mini_13B-GGML?
Orca Mini 13B GGML is a quantized version of the original Orca Mini 13B model, optimized for CPU and GPU inference. It's based on OpenLLaMA-13B and has been trained on explain-tuned datasets from WizardLM, Alpaca, and Dolly-V2, implementing approaches from the Orca research paper.
Implementation Details
The model comes in various quantization formats ranging from 2-bit to 8-bit versions, offering different trade-offs between model size, performance, and accuracy. The implementation supports both traditional llama.cpp quantization methods (q4_0, q4_1, q5_0, etc.) and newer k-quant methods.
- Multiple quantization options from 5.51GB (q2_K) to 13.83GB (q8_0)
- Compatible with llama.cpp and various UI frameworks
- Supports GPU offloading for improved performance
Core Capabilities
- Instruction-following and conversational AI tasks
- Efficient CPU/GPU inference with various quantization options
- Supports context length of 2048 tokens
- Implementation of advanced thought processes learned from GPT-3.5-turbo
Frequently Asked Questions
Q: What makes this model unique?
This model combines the efficiency of GGML quantization with the sophisticated training approach of Orca, making it particularly suitable for local deployment while maintaining good performance characteristics.
Q: What are the recommended use cases?
The model is ideal for users requiring local AI deployment with limited computational resources. It's particularly well-suited for text generation, instruction following, and conversational AI applications where a balance between performance and resource usage is needed.