TeleChat-7B

Property	Value
Parameter Count	7 Billion
Model Type	Decoder-only Transformer
License	Apache 2.0
Context Length	8K (expandable to 96K)
Paper	Technical Report

What is TeleChat-7B?

TeleChat-7B is a large language model developed by China Telecom AI Technology Co., Ltd. It's trained on 1.5 trillion tokens of high-quality Chinese and English data, featuring advanced architectural choices like Rotary Embeddings and SwiGLU activation functions. The model demonstrates strong capabilities across general Q&A, knowledge-based tasks, coding, and mathematical reasoning.

Implementation Details

The model implements a Decoder-only architecture with several technical innovations:

30 layers with 4096 hidden size and 12288 FFN hidden size
Uses Rotary Embeddings for position encoding, improving training speed by 20% with Flash-Attention v2
Implements SwiGLU activation function and RMSNorm for better performance
Supports DeepSpeed and Zero parallel optimization

Core Capabilities

Strong performance on benchmarks like MMLU (60.5%), C-Eval (64.6%), and CMMLU (64.3%)
Excellent Chinese language understanding and generation
Long-form content generation including work reports, plans, and technical documents
Multi-turn conversation support with specialized mask loss training
Context length extrapolation up to 96K using NTK-aware and attention scaling methods

Frequently Asked Questions

Q: What makes this model unique?

TeleChat-7B stands out for its extensive training on 1.5T tokens of high-quality Chinese-English data, specialized architecture optimizations, and strong performance across diverse tasks, particularly in long-form content generation and multi-turn conversations.

Q: What are the recommended use cases?

The model excels at general Q&A, long-form content creation (business documents, technical writing), multi-turn conversations, and tasks requiring strong reasoning capabilities in both Chinese and English contexts.

telechat-7B