LLaMA-Pro-8B
Property | Value |
---|---|
Parameter Count | 8.36B |
Model Type | Language Model |
Architecture | LLaMA2-based Transformer |
License | LLaMA2 |
Developer | TencentARC |
Tensor Type | BF16 |
What is LLaMA-Pro-8B?
LLaMA-Pro-8B is an advanced language model developed by Tencent's ARC Lab, built upon the LLaMA2 architecture. It represents a significant evolution with 8.36 billion parameters, specifically enhanced for programming and mathematical tasks. The model has been trained on an extensive dataset of 80 billion tokens, including specialized code and mathematical content.
Implementation Details
The model implements an enhanced version of the LLaMA architecture with additional Transformer blocks. It utilizes BF16 tensor types for efficient computation and has been optimized for both general language understanding and domain-specific tasks.
- Built on LLaMA2-7B architecture with additional specialized training
- Trained on 80 billion tokens of diverse data
- Optimized for programming and mathematical reasoning
- Implements advanced transformer architecture
Core Capabilities
- Outperforms LLaMA2-7B in multiple benchmarks (44.2 vs 39.62 average score)
- Enhanced performance in programming tasks (28.66% on HumanEval)
- Improved mathematical reasoning (25.42% on GSM8K-PoT)
- Strong general language understanding (77.94% on Hellaswag)
- Competitive MT Bench scores in its instruct version (6.32)
Frequently Asked Questions
Q: What makes this model unique?
LLaMA-Pro-8B stands out through its specialized focus on programming and mathematical tasks while maintaining strong general language capabilities. It achieves this through additional transformer blocks and targeted training on domain-specific content.
Q: What are the recommended use cases?
The model is particularly well-suited for tasks involving programming, mathematical reasoning, and general language understanding. It's ideal for applications requiring integration of natural language with technical content, code generation, and mathematical problem-solving.