Mistral_Pro_8B_v0.1
Property | Value |
---|---|
Parameter Count | 8.99B |
License | Apache-2.0 |
Tensor Type | BF16 |
Author | TencentARC |
Language | English |
What is Mistral_Pro_8B_v0.1?
Mistral_Pro_8B_v0.1 is an advanced language model developed by Tencent's ARC Lab, building upon the original Mistral-7B architecture. It represents a significant evolution with 8.99 billion parameters, specifically enhanced for programming and mathematical tasks while maintaining strong general language capabilities.
Implementation Details
The model utilizes additional Transformer blocks beyond the original Mistral architecture, trained on diverse datasets including Cosmopedia, Proof-Pile-2, The Stack, and AutoMathText. It implements BF16 tensor precision for optimal performance and efficiency.
- Enhanced transformer architecture with specialized blocks
- Trained on four major datasets focusing on general knowledge, proofs, code, and mathematics
- Optimized for both general language understanding and domain-specific tasks
Core Capabilities
- Superior performance on mathematical reasoning (GSM8K: 50.6%)
- Enhanced code generation capabilities (HumanEval: 32.9%)
- Improved truthfulness in responses (TruthfulQA: 48.3%)
- Strong general language understanding (Hellaswag: 82.6%)
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its balanced performance across general language tasks while offering superior capabilities in programming and mathematics, matching or exceeding the performance of models like Gemma-7B in several benchmarks.
Q: What are the recommended use cases?
The model is ideal for applications requiring integrated handling of natural language, programming, and mathematical tasks. It's particularly well-suited for code generation, mathematical problem-solving, and general language understanding tasks.