Magnum v2 72B
Property | Value |
---|---|
Parameter Count | 72.7B |
Base Model | Qwen/Qwen2-72B-Instruct |
License | Tongyi Qianwen |
Supported Languages | 9 (EN, FR, DE, ES, IT, PT, RU, ZH, JA) |
Training Hardware | 8x AMD Instinct MI300X |
What is magnum-v2-72b?
Magnum v2 72B is the seventh iteration in Anthracite's series of models aimed at replicating Claude 3's prose quality. Built on Qwen2-72B-Instruct, it's a large language model fine-tuned using specialized datasets to enhance writing quality and instruction-following capabilities.
Implementation Details
The model underwent a comprehensive training process using 8 AMD Instinct MI300X accelerators, with full-parameter fine-tuning across 2 epochs. Key technical specifications include a weight decay of 0.01 and a peak learning rate of 4e-6, optimized to prevent overfitting. The model uses ChatML formatting for input processing and supports a 16k token context window.
- Trained on multiple high-quality datasets including Stheno-Data-Filtered and Claude writing datasets
- Implements BF16 precision for optimal performance
- Utilizes sample packing for 16k tokens
Core Capabilities
- Strong performance on IFEval with 75.6% accuracy
- Impressive BBH performance at 57.85% accuracy
- Notable MMLU-PRO score of 49.51%
- Multilingual support across 9 major languages
- Advanced instruction-following abilities
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its focus on replicating Claude 3's writing quality while maintaining strong performance across multiple benchmarks. Its multilingual capabilities and extensive fine-tuning on specialized datasets make it particularly valuable for diverse applications.
Q: What are the recommended use cases?
The model excels in text generation tasks, particularly those requiring high-quality prose and accurate instruction following. It's well-suited for multilingual applications, creative writing, and complex reasoning tasks as evidenced by its strong performance on various benchmarks.