LLaMA-Pro-8B

Property	Value
Parameter Count	8.36B
Model Type	Language Model
Architecture	LLaMA2-based Transformer
License	LLaMA2
Developer	TencentARC
Tensor Type	BF16

What is LLaMA-Pro-8B?

LLaMA-Pro-8B is an advanced language model developed by Tencent's ARC Lab, built upon the LLaMA2 architecture. It represents a significant evolution with 8.36 billion parameters, specifically enhanced for programming and mathematical tasks. The model has been trained on an extensive dataset of 80 billion tokens, including specialized code and mathematical content.

Implementation Details

The model implements an enhanced version of the LLaMA architecture with additional Transformer blocks. It utilizes BF16 tensor types for efficient computation and has been optimized for both general language understanding and domain-specific tasks.

Built on LLaMA2-7B architecture with additional specialized training
Trained on 80 billion tokens of diverse data
Optimized for programming and mathematical reasoning
Implements advanced transformer architecture

Core Capabilities

Outperforms LLaMA2-7B in multiple benchmarks (44.2 vs 39.62 average score)
Enhanced performance in programming tasks (28.66% on HumanEval)
Improved mathematical reasoning (25.42% on GSM8K-PoT)
Strong general language understanding (77.94% on Hellaswag)
Competitive MT Bench scores in its instruct version (6.32)

Frequently Asked Questions

Q: What makes this model unique?

LLaMA-Pro-8B stands out through its specialized focus on programming and mathematical tasks while maintaining strong general language capabilities. It achieves this through additional transformer blocks and targeted training on domain-specific content.

Q: What are the recommended use cases?

The model is particularly well-suited for tasks involving programming, mathematical reasoning, and general language understanding. It's ideal for applications requiring integration of natural language with technical content, code generation, and mathematical problem-solving.

LLaMA-Pro-8B

LLaMA-Pro-8B

What is LLaMA-Pro-8B?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models