Crystal
Property | Value |
---|---|
Parameter Count | 7 Billion |
License | Apache 2.0 |
Paper | Research Paper |
Training Data | SlimPajama and StarCoder |
Architecture | LLaMA-based with muP modifications |
What is Crystal?
Crystal is an advanced 7B parameter language model that represents a significant achievement in balanced language model development. Trained on 1.4 trillion tokens from SlimPajama and StarCoder datasets, it demonstrates exceptional capabilities in both natural language processing and coding tasks. Despite using fewer training tokens than LLaMA 2, Crystal achieves superior performance on several benchmarks including MMLU, HumanEval, and MBPP.
Implementation Details
Crystal utilizes a GPT-like architecture similar to LLaMA but incorporates maximal update parameterization (muP) for enhanced performance. The model features a unique training approach split into three stages, processing different portions of SlimPajama and StarCoder data. The architecture includes specialized embedding scaling, refined attention mechanisms, and custom learning rate optimizations.
- Custom tokenizer with 32,032 vocabulary size
- Training sequence length of 2048
- LayerNorm instead of RMSNorm
- Rotary position embeddings on 25% of hidden dimensions
Core Capabilities
- Superior performance in coding tasks (HumanEval, MBPP)
- Strong natural language understanding (MMLU, ARC)
- Balanced performance across language and coding tasks
- Support for filling-in-middle (FIM) inference
- Specialized handling of code metadata and instruction tuning
Frequently Asked Questions
Q: What makes this model unique?
Crystal stands out for its balanced performance in both coding and language tasks, achieved through innovative muP implementation and strategic three-stage training process. It manages to outperform larger models while using fewer training tokens.
Q: What are the recommended use cases?
The model excels in both programming tasks and natural language processing, making it ideal for code generation, technical documentation, and general language understanding tasks. It's particularly well-suited for applications requiring both coding and natural language capabilities.