Wizard-Mega-13B
Property | Value |
---|---|
Base Model | LLaMA 13B |
Training Framework | Axolotl |
Training Infrastructure | 8x A100 80GB GPUs |
Training Duration | 15 hours |
What is wizard-mega-13b?
Wizard-Mega-13B is a sophisticated language model built on the LLaMA 13B architecture, fine-tuned on a carefully curated combination of ShareGPT, WizardLM, and Wizard-Vicuna datasets. The model has been specifically optimized by removing responses containing typical AI disclaimers, resulting in more natural and direct interactions. This model represents an important milestone in open-source AI development, though it has since been succeeded by Manticore-13B.
Implementation Details
The model was trained using the Axolotl framework on a powerful 8xA100 80GB GPU setup for 15 hours. Training was concluded after two epochs when evaluation loss showed an increase during the third epoch, indicating optimal model convergence at epoch two.
- Built with transformer architecture using PyTorch
- Trained on filtered versions of ShareGPT, WizardLM, and Wizard-Vicuna datasets
- Optimized for text generation and instruction following
- Available in quantized GGML format for efficient inference
Core Capabilities
- Advanced text generation and completion
- Code generation with context understanding
- Natural conversation and instruction following
- Efficient performance through memoization and optimization
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its filtered training data approach, removing typical AI disclaimers and hesitations, resulting in more direct and natural responses. It also benefits from multiple high-quality instruction-following datasets.
Q: What are the recommended use cases?
The model excels in code generation, creative writing, and general instruction-following tasks. However, users should note it hasn't undergone RLHF alignment, so appropriate content filtering may be necessary for production use.