Llama3-70B-Chinese-Chat

Property	Value
Parameter Count	70.6B
Model Type	Language Model
Architecture	Llama3
License	Llama3 License
Training Framework	LLaMA-Factory
Paper	ORPO Paper

What is Llama3-70B-Chinese-Chat?

Llama3-70B-Chinese-Chat is a sophisticated large language model specifically fine-tuned for Chinese and English language capabilities. Built upon Meta's Llama-3-70B-Instruct model, it has been trained on over 100,000 preference pairs using the ORPO (Reference-free Monolithic Preference Optimization) algorithm. The model demonstrates exceptional performance in Chinese language tasks, matching GPT-4's capabilities on benchmarks like C-Eval and CMMLU.

Implementation Details

The model was trained using full-parameter fine-tuning with specific hyperparameters including a learning rate of 1.5e-6, cosine learning rate scheduler, and a context length of 8192 tokens. The training process utilized the LLaMA-Factory framework with a global batch size of 128 and the paged_adamw_32bit optimizer.

Training epochs: 3
Warmup ratio: 0.1
ORPO beta: 0.05
Context length: 8192 tokens

Core Capabilities

Bilingual proficiency in Chinese and English
Advanced roleplaying abilities
Strong mathematical reasoning
Function calling capabilities
Matches GPT-4 performance on Chinese benchmarks
Achieves 66.1% on C-Eval and 70.28% on CMMLU

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its exceptional Chinese language capabilities, matching GPT-4's performance on major Chinese benchmarks while maintaining strong English language abilities. It's one of the first LLMs specifically fine-tuned for Chinese and English users with various advanced capabilities.

Q: What are the recommended use cases?

The model excels in bilingual conversations, roleplay scenarios, mathematical problem-solving, and function calling tasks. It's particularly well-suited for applications requiring strong Chinese language understanding and generation capabilities.