Zurich-14B-GCv2-10k
Property | Value |
---|---|
Base Model | Qwen 2.5 14B Instruct |
Parameter Count | 14.7B (13.1B Non-Embedding) |
Architecture | Transformers with RoPE, SwiGLU, RMSNorm, and QKV bias |
Training Dataset | GammaCorpus v2-10k |
License | Apache 2.0 |
What is Zurich-14B-GCv2-10k?
Zurich-14B-GCv2-10k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-10k dataset. This model represents a strategic enhancement aimed at outperforming comparable models in its size category while showcasing the capabilities of the GammaCorpus dataset.
Implementation Details
The model features a sophisticated architecture with 48 layers and employs 40 attention heads for queries and 8 for key-values using Group Query Attention (GQA). The implementation leverages advanced components including Rotary Position Embedding (RoPE), SwiGLU activation functions, and RMSNorm for enhanced performance. The training process was notably efficient, utilizing a single A100 GPU for approximately 10 minutes over 60 epochs using the Unsloth framework.
- Advanced transformer architecture with RoPE and SwiGLU
- Efficient training implementation using Unsloth framework
- Optimized with Group Query Attention mechanism
- Built on the robust Qwen 2.5 foundation
Core Capabilities
- Advanced language understanding and generation
- Optimized for multi-turn conversations
- Structured response generation
- Bias-mitigated outputs
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its combination of the powerful Qwen 2.5 architecture with the specialized GammaCorpus v2-10k dataset, optimized through efficient training techniques and advanced architectural components like GQA and RoPE.
Q: What are the recommended use cases?
This model is particularly well-suited for applications requiring structured dialogue generation, multi-turn conversations, and general language understanding tasks. It's designed to provide balanced, bias-aware responses while maintaining high performance across various use cases.