Zurich-14B-GCv2-50k
Property | Value |
---|---|
Parameter Count | 14.7B (13.1B Non-Embedding) |
Base Model | Qwen 2.5 14B Instruct |
Architecture | Transformers with RoPE, SwiGLU, RMSNorm, and QKV bias |
Training Dataset | GammaCorpus v2-50k |
License | Apache 2.0 |
What is Zurich-14B-GCv2-50k?
Zurich-14B-GCv2-50k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-50k dataset. This model represents a significant advancement in structured dialogue generation, featuring 48 layers and 40 attention heads for queries with 8 for key-values.
Implementation Details
The model underwent a focused fine-tuning process utilizing a single A100 GPU for approximately 20 minutes, completing 60 epochs of training using the Unsloth framework. The architecture implements several sophisticated components including Rotary Position Embedding (RoPE), SwiGLU activations, and RMSNorm, combined with attention QKV bias for enhanced performance.
- 48 transformer layers with Group Query Attention (GQA)
- Optimized attention mechanism with 40/8 head split for Q and KV
- Built on the robust Qwen 2.5 architecture
- Trained on structured and filtered multi-turn conversations
Core Capabilities
- Advanced dialogue generation with structured outputs
- Enhanced context understanding through GammaCorpus training
- Efficient performance with optimized attention mechanisms
- Robust text generation with bias mitigation features
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness stems from its fine-tuning on the carefully curated GammaCorpus v2-50k dataset, combined with its sophisticated architecture that leverages Group Query Attention and advanced positioning embeddings. This combination allows for more efficient and accurate text generation while maintaining computational efficiency.
Q: What are the recommended use cases?
The model excels in structured dialogue generation, making it particularly suitable for conversational AI applications, customer service automation, and general text generation tasks. Its training on filtered multi-turn conversations makes it especially effective for maintaining context in extended dialogues.