Orca Mini 3B GGML
Property | Value |
---|---|
License | MIT |
Research Paper | Orca Paper |
Base Architecture | OpenLLaMA 3B |
Training Datasets | WizardLM, Alpaca, Dolly-V2 |
What is orca_mini_3B-GGML?
Orca Mini 3B GGML is a CPU-optimized version of Pankaj Mathur's Orca Mini 3B model, specifically designed for efficient deployment across various computing environments. This model represents a quantized version of the original architecture, offering multiple compression options ranging from 4-bit to 8-bit precision to balance performance and resource usage.
Implementation Details
The model is available in various quantization formats, including q4_0, q4_1, q5_0, q5_1, and q8_0, with file sizes ranging from 1.93GB to 3.64GB. It's optimized for inference using llama.cpp and supports both CPU and GPU acceleration through various interfaces like text-generation-webui and KoboldCpp.
- Training utilized 8x A100(80G) GPUs with DeepSpeed and ZeRO stage 3 optimization
- Trained for approximately 4 hours using a batch size of 64
- Implements 15 system instructions from the Orca Research Paper
Core Capabilities
- Efficient CPU+GPU inference with multiple quantization options
- Support for context length up to 2048 tokens
- Follows instruction-based prompting with system, user, and input components
- Optimized for both resource-constrained and high-performance environments
Frequently Asked Questions
Q: What makes this model unique?
This model combines the efficiency of GGML quantization with the Orca architecture's instruction-following capabilities, making it particularly suitable for deployment in resource-constrained environments while maintaining good performance.
Q: What are the recommended use cases?
The model is ideal for applications requiring local deployment, particularly when GPU resources are limited. It's suitable for text generation, instruction following, and general language understanding tasks where a balance between performance and resource usage is crucial.