Skywork-13B-base

Property	Value
Model Size	13B parameters
Training Data	3.2T tokens
Language Distribution	52.2% English, 39.6% Chinese, 8% Code
License	Skywork Community License
Paper	Technical Report

What is Skywork-13B-base?

Skywork-13B-base is a state-of-the-art bilingual foundation model designed with a unique "thin and deep" architecture, featuring 52 layers compared to traditional 40-layer designs. The model demonstrates exceptional performance across multiple benchmarks, including C-Eval (60.6%), CMMLU (61.8%), and MMLU (62.1%).

Implementation Details

The model employs a modified architecture with 4,608 hidden dimensions and 12,288 FFN dimensions, utilizing a custom vocabulary of 65,536 tokens optimized for bilingual processing. The tokenizer implements Byte-Pair Encoding with specific allocations for Latin characters, Chinese characters, and specialized tokens.

52 transformer layers (compared to traditional 40)
4,608 hidden dimension size
36 attention heads
Custom 65K vocabulary

Core Capabilities

Superior bilingual understanding and generation
Strong performance in technical and academic content
Efficient code processing capabilities
State-of-the-art performance in multiple benchmarks

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive "thin and deep" architecture with 52 layers and optimized dimensions, combined with its carefully curated multilingual training data, sets it apart from other models in its class.

Q: What are the recommended use cases?

The model excels in bilingual applications, academic content processing, technical documentation, and code-related tasks. It's particularly well-suited for applications requiring strong Chinese-English capabilities.

Skywork-13B-base

Skywork-13B-base

What is Skywork-13B-base?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models