TeleChat-7B
Property | Value |
---|---|
Parameter Count | 7 Billion |
Model Type | Decoder-only Transformer |
License | Apache 2.0 |
Context Length | 8K (expandable to 96K) |
Paper | Technical Report |
What is TeleChat-7B?
TeleChat-7B is a large language model developed by China Telecom AI Technology Co., Ltd. It's trained on 1.5 trillion tokens of high-quality Chinese and English data, featuring advanced architectural choices like Rotary Embeddings and SwiGLU activation functions. The model demonstrates strong capabilities across general Q&A, knowledge-based tasks, coding, and mathematical reasoning.
Implementation Details
The model implements a Decoder-only architecture with several technical innovations:
- 30 layers with 4096 hidden size and 12288 FFN hidden size
- Uses Rotary Embeddings for position encoding, improving training speed by 20% with Flash-Attention v2
- Implements SwiGLU activation function and RMSNorm for better performance
- Supports DeepSpeed and Zero parallel optimization
Core Capabilities
- Strong performance on benchmarks like MMLU (60.5%), C-Eval (64.6%), and CMMLU (64.3%)
- Excellent Chinese language understanding and generation
- Long-form content generation including work reports, plans, and technical documents
- Multi-turn conversation support with specialized mask loss training
- Context length extrapolation up to 96K using NTK-aware and attention scaling methods
Frequently Asked Questions
Q: What makes this model unique?
TeleChat-7B stands out for its extensive training on 1.5T tokens of high-quality Chinese-English data, specialized architecture optimizations, and strong performance across diverse tasks, particularly in long-form content generation and multi-turn conversations.
Q: What are the recommended use cases?
The model excels at general Q&A, long-form content creation (business documents, technical writing), multi-turn conversations, and tasks requiring strong reasoning capabilities in both Chinese and English contexts.