Crystal

Property	Value
Parameter Count	7 Billion
License	Apache 2.0
Paper	Research Paper
Training Data	SlimPajama and StarCoder
Architecture	LLaMA-based with muP modifications

What is Crystal?

Crystal is an advanced 7B parameter language model that represents a significant achievement in balanced language model development. Trained on 1.4 trillion tokens from SlimPajama and StarCoder datasets, it demonstrates exceptional capabilities in both natural language processing and coding tasks. Despite using fewer training tokens than LLaMA 2, Crystal achieves superior performance on several benchmarks including MMLU, HumanEval, and MBPP.

Implementation Details

Crystal utilizes a GPT-like architecture similar to LLaMA but incorporates maximal update parameterization (muP) for enhanced performance. The model features a unique training approach split into three stages, processing different portions of SlimPajama and StarCoder data. The architecture includes specialized embedding scaling, refined attention mechanisms, and custom learning rate optimizations.

Custom tokenizer with 32,032 vocabulary size
Training sequence length of 2048
LayerNorm instead of RMSNorm
Rotary position embeddings on 25% of hidden dimensions

Core Capabilities

Superior performance in coding tasks (HumanEval, MBPP)
Strong natural language understanding (MMLU, ARC)
Balanced performance across language and coding tasks
Support for filling-in-middle (FIM) inference
Specialized handling of code metadata and instruction tuning

Frequently Asked Questions

Q: What makes this model unique?

Crystal stands out for its balanced performance in both coding and language tasks, achieved through innovative muP implementation and strategic three-stage training process. It manages to outperform larger models while using fewer training tokens.

Q: What are the recommended use cases?

The model excels in both programming tasks and natural language processing, making it ideal for code generation, technical documentation, and general language understanding tasks. It's particularly well-suited for applications requiring both coding and natural language capabilities.

Crystal

Crystal

What is Crystal?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models