Breeze-7B-Instruct-v1_0
Property | Value |
---|---|
Parameter Count | 7.49B |
License | Apache 2.0 |
Paper | arXiv:2403.02712 |
Context Length | 8K tokens |
Languages | English, Traditional Chinese |
What is Breeze-7B-Instruct-v1_0?
Breeze-7B-Instruct-v1_0 is an advanced language model developed by MediaTek Research, specifically designed to excel in both Traditional Chinese and English language tasks. Built upon the Mistral-7B architecture, it features an expanded vocabulary of 62,000 tokens (30,000 more than the original) to better support Traditional Chinese processing.
Implementation Details
The model is implemented as a causal decoder-only transformer, fine-tuned from Breeze-7B-Base. It utilizes BF16 precision and incorporates advanced features for efficient processing in multilingual contexts.
- Expanded vocabulary (62k tokens) optimized for Traditional Chinese
- 8,000 token context window
- Multi-turn dialogue capability
- 2x faster inference speed for Traditional Chinese compared to base Mistral-7B
Core Capabilities
- Strong performance in MT-Bench-tw with a score of 6.0
- Competitive MMLU accuracy of 61.73%
- Excellent Traditional Chinese reasoning and knowledge tasks
- Efficient processing of long-form content
- Support for Q&A, RAG, multi-round chat, and summarization
Frequently Asked Questions
Q: What makes this model unique?
The model's expanded vocabulary and optimization for Traditional Chinese processing, combined with its competitive performance against larger models like GPT-3.5-Turbo in specific tasks, makes it particularly valuable for Traditional Chinese applications.
Q: What are the recommended use cases?
The model excels in Traditional Chinese text generation, Q&A systems, multi-turn conversations, and content summarization. It's particularly suitable for applications requiring both English and Traditional Chinese language capabilities.