Manticore-13B
Property | Value |
---|---|
Base Model | LLaMA 13B |
Training Infrastructure | 8xA100 80GB GPUs |
Framework | PyTorch/Transformers |
Primary Language | English |
What is manticore-13b?
Manticore-13B is an advanced language model built on the LLaMA 13B architecture, fine-tuned on a carefully curated collection of diverse datasets. Developed by the OpenAccess AI Collective, this model represents a significant advancement in instruction-following and general-purpose text generation capabilities.
Implementation Details
The model was trained using the Axolotl framework over 3 epochs, taking approximately 24 hours on 8xA100 80GB GPUs. It incorporates multiple high-quality datasets including ShareGPT, WizardLM, Wizard-Vicuna, and specialized instruction sets for various tasks.
- Built with text-generation-inference optimization
- Includes GGML quantized versions for efficient deployment
- Trained on 10 diverse datasets for comprehensive knowledge
Core Capabilities
- Advanced instruction following and task completion
- Scientific and logical reasoning (trained on MMLU subset)
- Code generation and explanation
- Summarization and content generation
- Role-playing and creative writing
Frequently Asked Questions
Q: What makes this model unique?
Manticore-13B stands out for its comprehensive training on diverse, high-quality datasets and its ability to handle both technical and creative tasks effectively. Unlike many models, it maintains strong performance without RLHF alignment, making it versatile for various applications.
Q: What are the recommended use cases?
The model excels in code generation, scientific explanation, creative writing, and general instruction following. It's particularly suitable for applications requiring detailed responses in technical fields like physics, logic, and mathematics.