Polyglot-Ko-1.3B
Property | Value |
---|---|
Parameters | 1.3B |
Architecture | GPT-NeoX |
Training Data | 863GB Korean Text |
License | Apache 2.0 |
Paper | Technical Report |
What is polyglot-ko-1.3b?
Polyglot-Ko-1.3B is a large-scale Korean language model developed by EleutherAI's polyglot team. It's trained on a diverse 863GB dataset of Korean text, making it one of the most comprehensive Korean language models available. The model employs a transformer architecture with 24 layers, 2048 hidden dimensions, and 16 attention heads.
Implementation Details
The model utilizes the GPT-NeoX framework and was trained on 256 A100 GPUs over 102,000 steps, processing 213 billion tokens. It implements Rotary Position Embedding (RoPE) and has a context window of 2048 tokens.
- 24 transformer layers with 2048 model dimension
- 8192 feedforward dimension
- 16 attention heads with 128 dimensions each
- 30,003 vocabulary size
Core Capabilities
- Strong performance on Korean language understanding tasks
- Competitive results on KOBEST benchmark
- Built-in PII protection through masking
- Suitable for text generation and completion tasks
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its specialized Korean language capabilities and impressive performance despite its relatively modest size. It consistently outperforms similar-sized models and even some larger models on Korean language tasks.
Q: What are the recommended use cases?
The model excels in Korean text generation, sentiment analysis, and various downstream NLP tasks. It's particularly effective for COPA (causality understanding), HellaSwag (common sense reasoning), and SentiNeg (sentiment analysis) tasks.