wizard-mega-13b

Maintained By
openaccess-ai-collective

Wizard-Mega-13B

PropertyValue
Base ModelLLaMA 13B
Training FrameworkAxolotl
Training Infrastructure8x A100 80GB GPUs
Training Duration15 hours

What is wizard-mega-13b?

Wizard-Mega-13B is a sophisticated language model built on the LLaMA 13B architecture, fine-tuned on a carefully curated combination of ShareGPT, WizardLM, and Wizard-Vicuna datasets. The model has been specifically optimized by removing responses containing typical AI disclaimers, resulting in more natural and direct interactions. This model represents an important milestone in open-source AI development, though it has since been succeeded by Manticore-13B.

Implementation Details

The model was trained using the Axolotl framework on a powerful 8xA100 80GB GPU setup for 15 hours. Training was concluded after two epochs when evaluation loss showed an increase during the third epoch, indicating optimal model convergence at epoch two.

  • Built with transformer architecture using PyTorch
  • Trained on filtered versions of ShareGPT, WizardLM, and Wizard-Vicuna datasets
  • Optimized for text generation and instruction following
  • Available in quantized GGML format for efficient inference

Core Capabilities

  • Advanced text generation and completion
  • Code generation with context understanding
  • Natural conversation and instruction following
  • Efficient performance through memoization and optimization

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its filtered training data approach, removing typical AI disclaimers and hesitations, resulting in more direct and natural responses. It also benefits from multiple high-quality instruction-following datasets.

Q: What are the recommended use cases?

The model excels in code generation, creative writing, and general instruction-following tasks. However, users should note it hasn't undergone RLHF alignment, so appropriate content filtering may be necessary for production use.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.