Zurich-14B-GCv2-500k
Property | Value |
---|---|
Base Model | Qwen 2.5 14B Instruct |
Parameter Count | 14.7B (13.1B Non-Embedding) |
Architecture | Transformer with RoPE, SwiGLU, RMSNorm |
Training Dataset | GammaCorpus v2-500k |
License | Apache 2.0 |
What is Zurich-14B-GCv2-500k?
Zurich-14B-GCv2-500k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-500k dataset. This model represents a significant advancement in language model capabilities, combining the robust architecture of Qwen 2.5 with specialized training data.
Implementation Details
The model features a sophisticated architecture with 48 layers and implements Group Query Attention (GQA) with an innovative 40/8 split for query and key-value heads. The training process was notably efficient, completed in approximately 40 minutes using a single A100 GPU through the Unsloth framework, spanning 60 epochs.
- Advanced attention mechanism with 40 Q-heads and 8 KV-heads
- Implements RoPE (Rotary Position Embedding)
- Uses SwiGLU activation and RMSNorm
- Optimized with attention QKV bias
Core Capabilities
- Enhanced instruction following abilities inherited from Qwen 2.5
- Structured conversation handling from GammaCorpus training
- Efficient processing with optimized attention mechanisms
- Balanced performance across various language tasks
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness stems from its combination of Qwen 2.5's robust architecture with GammaCorpus v2-500k training data, featuring an optimized GQA implementation and efficient training methodology.
Q: What are the recommended use cases?
This model is particularly well-suited for structured conversations, general language understanding tasks, and applications requiring balanced performance between efficiency and capability.