Orca Mini 3B GGML

Property	Value
License	MIT
Research Paper	Orca Paper
Base Architecture	OpenLLaMA 3B
Training Datasets	WizardLM, Alpaca, Dolly-V2

What is orca_mini_3B-GGML?

Orca Mini 3B GGML is a CPU-optimized version of Pankaj Mathur's Orca Mini 3B model, specifically designed for efficient deployment across various computing environments. This model represents a quantized version of the original architecture, offering multiple compression options ranging from 4-bit to 8-bit precision to balance performance and resource usage.

Implementation Details

The model is available in various quantization formats, including q4_0, q4_1, q5_0, q5_1, and q8_0, with file sizes ranging from 1.93GB to 3.64GB. It's optimized for inference using llama.cpp and supports both CPU and GPU acceleration through various interfaces like text-generation-webui and KoboldCpp.

Training utilized 8x A100(80G) GPUs with DeepSpeed and ZeRO stage 3 optimization
Trained for approximately 4 hours using a batch size of 64
Implements 15 system instructions from the Orca Research Paper

Core Capabilities

Efficient CPU+GPU inference with multiple quantization options
Support for context length up to 2048 tokens
Follows instruction-based prompting with system, user, and input components
Optimized for both resource-constrained and high-performance environments

Frequently Asked Questions

Q: What makes this model unique?

This model combines the efficiency of GGML quantization with the Orca architecture's instruction-following capabilities, making it particularly suitable for deployment in resource-constrained environments while maintaining good performance.

Q: What are the recommended use cases?

The model is ideal for applications requiring local deployment, particularly when GPU resources are limited. It's suitable for text generation, instruction following, and general language understanding tasks where a balance between performance and resource usage is crucial.

orca_mini_3B-GGML