Llama3-70B-Chinese-Chat
Property | Value |
---|---|
Parameter Count | 70.6B |
Model Type | Language Model |
Architecture | Llama3 |
License | Llama3 License |
Training Framework | LLaMA-Factory |
Paper | ORPO Paper |
What is Llama3-70B-Chinese-Chat?
Llama3-70B-Chinese-Chat is a sophisticated large language model specifically fine-tuned for Chinese and English language capabilities. Built upon Meta's Llama-3-70B-Instruct model, it has been trained on over 100,000 preference pairs using the ORPO (Reference-free Monolithic Preference Optimization) algorithm. The model demonstrates exceptional performance in Chinese language tasks, matching GPT-4's capabilities on benchmarks like C-Eval and CMMLU.
Implementation Details
The model was trained using full-parameter fine-tuning with specific hyperparameters including a learning rate of 1.5e-6, cosine learning rate scheduler, and a context length of 8192 tokens. The training process utilized the LLaMA-Factory framework with a global batch size of 128 and the paged_adamw_32bit optimizer.
- Training epochs: 3
- Warmup ratio: 0.1
- ORPO beta: 0.05
- Context length: 8192 tokens
Core Capabilities
- Bilingual proficiency in Chinese and English
- Advanced roleplaying abilities
- Strong mathematical reasoning
- Function calling capabilities
- Matches GPT-4 performance on Chinese benchmarks
- Achieves 66.1% on C-Eval and 70.28% on CMMLU
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its exceptional Chinese language capabilities, matching GPT-4's performance on major Chinese benchmarks while maintaining strong English language abilities. It's one of the first LLMs specifically fine-tuned for Chinese and English users with various advanced capabilities.
Q: What are the recommended use cases?
The model excels in bilingual conversations, roleplay scenarios, mathematical problem-solving, and function calling tasks. It's particularly well-suited for applications requiring strong Chinese language understanding and generation capabilities.