Magnum V2 72B
Property | Value |
---|---|
Parameter Count | 72.7B |
Base Model | Qwen2-72B-Instruct |
License | Tongyi Qianwen |
Supported Languages | 9 (en, fr, de, es, it, pt, ru, zh, ja) |
Training Infrastructure | 8x AMD Instinct MI300X Accelerators |
What is magnum-v2-72b?
Magnum V2 72B is the seventh iteration in Anthracite's series of language models designed to replicate Claude 3's prose quality. Built on Qwen2-72B-Instruct, this model represents a significant advancement in multilingual capabilities and instruction following.
Implementation Details
The model underwent a meticulous training process spanning 2 epochs using state-of-the-art AMD MI300X accelerators. Notable technical specifications include a weight decay of 0.01 to prevent catastrophic forgetting and a peak learning rate of 4e-6. The model utilizes ChatML formatting for interactions and supports 16k token sample packing.
- Achieves 75.6% accuracy on IFEval (0-shot)
- Scores 57.85% on BBH (3-shot)
- Demonstrates 31.65% accuracy on MATH Level 5 problems
- Supports 9 different languages for multilingual applications
Core Capabilities
- High-quality prose generation similar to Claude 3
- Robust instruction following with ChatML format
- Advanced mathematical reasoning capabilities
- Multilingual support across major world languages
- Extended context handling with 16k tokens
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive feature is its optimization for Claude-like responses while maintaining strong performance across various benchmarks. It combines the robust architecture of Qwen2 with carefully curated training datasets to achieve high-quality outputs.
Q: What are the recommended use cases?
This model excels in scenarios requiring high-quality prose generation, multilingual communication, and complex reasoning tasks. It's particularly suitable for applications needing human-like responses across different languages and domains.