orca_mini_3b
Property | Value |
---|---|
Parameter Count | 3.43B |
License | CC-BY-NC-SA-4.0 |
Base Architecture | OpenLLaMA |
Research Paper | Orca Paper |
Average Benchmark Score | 39.03% |
What is orca_mini_3b?
orca_mini_3b is an instruction-tuned language model based on OpenLLaMA-3B, specifically trained on explanation-tuned datasets from WizardLM, Alpaca, and Dolly-V2. The model implements approaches from the Orca research paper to better capture complex reasoning and explanation capabilities.
Implementation Details
The model was trained using DeepSpeed with fully sharded data parallelism (ZeRO stage 3) on 8x A100(80G) GPUs. The training process took approximately 4 hours and utilized a batch size of 64, with a learning rate of 2e-5 over 3 epochs.
- Training dataset combines ~70K WizardLM, ~52K Alpaca, and ~15K Dolly-V2 samples
- Implements 15 system instructions from Orca Research Paper
- Uses float16 precision for inference
- Maximum sequence length of 1024 tokens
Core Capabilities
- Strong performance on reasoning tasks (Winogrande: 61.8%)
- Decent truth assessment abilities (TruthfulQA: 42.42%)
- Good common sense understanding (HellaSwag: 61.52%)
- Basic knowledge and reasoning (MMLU: 26.79%)
Frequently Asked Questions
Q: What makes this model unique?
The model's unique approach lies in its implementation of the Orca paper's methodology, focusing on learning thought processes from ChatGPT through carefully constructed system prompts and explanation-tuned datasets.
Q: What are the recommended use cases?
The model is well-suited for general text generation tasks, reasoning problems, and explanation generation. However, it should not be relied upon for factual accuracy or used in production without proper evaluation.