Skywork-13B-base
Property | Value |
---|---|
Model Size | 13B parameters |
Training Data | 3.2T tokens |
Language Distribution | 52.2% English, 39.6% Chinese, 8% Code |
License | Skywork Community License |
Paper | Technical Report |
What is Skywork-13B-base?
Skywork-13B-base is a state-of-the-art bilingual foundation model designed with a unique "thin and deep" architecture, featuring 52 layers compared to traditional 40-layer designs. The model demonstrates exceptional performance across multiple benchmarks, including C-Eval (60.6%), CMMLU (61.8%), and MMLU (62.1%).
Implementation Details
The model employs a modified architecture with 4,608 hidden dimensions and 12,288 FFN dimensions, utilizing a custom vocabulary of 65,536 tokens optimized for bilingual processing. The tokenizer implements Byte-Pair Encoding with specific allocations for Latin characters, Chinese characters, and specialized tokens.
- 52 transformer layers (compared to traditional 40)
- 4,608 hidden dimension size
- 36 attention heads
- Custom 65K vocabulary
Core Capabilities
- Superior bilingual understanding and generation
- Strong performance in technical and academic content
- Efficient code processing capabilities
- State-of-the-art performance in multiple benchmarks
Frequently Asked Questions
Q: What makes this model unique?
The model's distinctive "thin and deep" architecture with 52 layers and optimized dimensions, combined with its carefully curated multilingual training data, sets it apart from other models in its class.
Q: What are the recommended use cases?
The model excels in bilingual applications, academic content processing, technical documentation, and code-related tasks. It's particularly well-suited for applications requiring strong Chinese-English capabilities.