Trillion-7B-preview
Property | Value |
---|---|
Parameter Count | 7.76B |
Model Type | Causal Language Model |
Architecture | Transformer Decoder with RoPE, SwiGLU, RMSNorm |
Context Length | 4,096 tokens |
Training Tokens | 2T |
License | Apache-2.0 |
What is Trillion-7B-preview?
Trillion-7B-preview is an advanced multilingual language model that pushes the boundaries of efficiency and performance. With 7.76B parameters, it achieves competitive results while using significantly less compute (~9.3×10²² FLOPs) compared to similar models. The model particularly excels in Korean language tasks while maintaining strong capabilities in English, Japanese, and Chinese.
Implementation Details
The model features a sophisticated architecture combining a Transformer Decoder with RoPE (Rotary Position Embedding), SwiGLU activation, and RMSNorm. It processes a vocabulary of 128,128 tokens and handles contexts up to 4,096 tokens in length. The model underwent both pre-training and post-training phases, trained on 2 trillion tokens.
- 32 transformer layers with 32 attention heads
- Efficient compute utilization showing strong performance-to-FLOP ratio
- Comprehensive multilingual support with emphasis on Asian languages
- Built-in chat template functionality for easier implementation
Core Capabilities
- Strong performance in general reasoning and reading comprehension
- Excellent results in Korean language benchmarks (80.02% on HAERAE)
- Competitive mathematical reasoning capabilities (72.25% on GSM8k)
- Robust instruction following across multiple languages
- Effective coding abilities with 55.48% pass@1 on HumanEval
Frequently Asked Questions
Q: What makes this model unique?
The model's key differentiator is its ability to achieve high performance while using significantly less computational resources than competitors. It particularly excels in Korean language tasks while maintaining strong capabilities across English, Japanese, and Chinese, making it a truly efficient multilingual model.
Q: What are the recommended use cases?
The model is well-suited for multilingual applications, particularly those involving Asian languages. It shows strong performance in instruction following, mathematical reasoning, coding tasks, and general language understanding. However, users should note the knowledge cutoff date of August 2023 and the current lack of comprehensive safety features.