Wizard-Mega-13B

Property	Value
Base Model	LLaMA 13B
Training Framework	Axolotl
Training Infrastructure	8x A100 80GB GPUs
Training Duration	15 hours

What is wizard-mega-13b?

Wizard-Mega-13B is a sophisticated language model built on the LLaMA 13B architecture, fine-tuned on a carefully curated combination of ShareGPT, WizardLM, and Wizard-Vicuna datasets. The model has been specifically optimized by removing responses containing typical AI disclaimers, resulting in more natural and direct interactions. This model represents an important milestone in open-source AI development, though it has since been succeeded by Manticore-13B.

Implementation Details

The model was trained using the Axolotl framework on a powerful 8xA100 80GB GPU setup for 15 hours. Training was concluded after two epochs when evaluation loss showed an increase during the third epoch, indicating optimal model convergence at epoch two.

Built with transformer architecture using PyTorch
Trained on filtered versions of ShareGPT, WizardLM, and Wizard-Vicuna datasets
Optimized for text generation and instruction following
Available in quantized GGML format for efficient inference

Core Capabilities

Advanced text generation and completion
Code generation with context understanding
Natural conversation and instruction following
Efficient performance through memoization and optimization

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its filtered training data approach, removing typical AI disclaimers and hesitations, resulting in more direct and natural responses. It also benefits from multiple high-quality instruction-following datasets.

Q: What are the recommended use cases?

The model excels in code generation, creative writing, and general instruction-following tasks. However, users should note it hasn't undergone RLHF alignment, so appropriate content filtering may be necessary for production use.

wizard-mega-13b