Breeze-7B-Instruct-v1_0

Property	Value
Parameter Count	7.49B
License	Apache 2.0
Paper	arXiv:2403.02712
Context Length	8K tokens
Languages	English, Traditional Chinese

What is Breeze-7B-Instruct-v1_0?

Breeze-7B-Instruct-v1_0 is an advanced language model developed by MediaTek Research, specifically designed to excel in both Traditional Chinese and English language tasks. Built upon the Mistral-7B architecture, it features an expanded vocabulary of 62,000 tokens (30,000 more than the original) to better support Traditional Chinese processing.

Implementation Details

The model is implemented as a causal decoder-only transformer, fine-tuned from Breeze-7B-Base. It utilizes BF16 precision and incorporates advanced features for efficient processing in multilingual contexts.

Expanded vocabulary (62k tokens) optimized for Traditional Chinese
8,000 token context window
Multi-turn dialogue capability
2x faster inference speed for Traditional Chinese compared to base Mistral-7B

Core Capabilities

Strong performance in MT-Bench-tw with a score of 6.0
Competitive MMLU accuracy of 61.73%
Excellent Traditional Chinese reasoning and knowledge tasks
Efficient processing of long-form content
Support for Q&A, RAG, multi-round chat, and summarization

Frequently Asked Questions

Q: What makes this model unique?

The model's expanded vocabulary and optimization for Traditional Chinese processing, combined with its competitive performance against larger models like GPT-3.5-Turbo in specific tasks, makes it particularly valuable for Traditional Chinese applications.

Q: What are the recommended use cases?

The model excels in Traditional Chinese text generation, Q&A systems, multi-turn conversations, and content summarization. It's particularly suitable for applications requiring both English and Traditional Chinese language capabilities.