Breeze-7B-Instruct-v0_1

Property	Value
Parameter Count	7.49B
Model Type	Causal decoder-only transformer
License	Apache 2.0
Paper	Technical Report
Context Length	8k tokens

What is Breeze-7B-Instruct-v0_1?

Breeze-7B-Instruct-v0_1 is an instruction-tuned language model developed by MediaTek Research, specifically designed to excel at Traditional Chinese language tasks while maintaining strong English capabilities. Built upon Mistral-7B, it features an expanded vocabulary of 62,000 tokens (30,000 more than the base model) to better handle Traditional Chinese characters, resulting in twice the inference speed for Chinese text processing.

Implementation Details

The model utilizes a causal decoder-only transformer architecture and implements several technical innovations to enhance its performance. It supports BF16 precision and can be accelerated using Flash Attention 2 for optimal inference speed.

Expanded vocabulary dictionary (62k tokens) optimized for Traditional Chinese
8k token context window
Multi-turn dialogue support
Built on Mistral-7B architecture with significant modifications

Core Capabilities

Strong performance in both Traditional Chinese and English benchmarks
Efficient inference with 2x speed for Chinese text compared to base models
Excels in Q&A, RAG, multi-round chat, and summarization tasks
Competitive MT-Bench scores (5.7 for Traditional Chinese, 7.1 for English)
Enhanced performance in TMMLU+ and other Traditional Chinese benchmarks

Frequently Asked Questions

Q: What makes this model unique?

The model's expanded vocabulary and optimization for Traditional Chinese, combined with maintaining strong English capabilities, sets it apart. It achieves faster inference speeds while delivering competitive performance across various benchmarks.

Q: What are the recommended use cases?

The model is particularly well-suited for Traditional Chinese language tasks, including question-answering, chat applications, summarization, and RAG implementations. It's designed for production deployment with efficient inference capabilities.