Zurich-14B-GCv2-10k

Property	Value
Base Model	Qwen 2.5 14B Instruct
Parameter Count	14.7B (13.1B Non-Embedding)
Architecture	Transformers with RoPE, SwiGLU, RMSNorm, and QKV bias
Training Dataset	GammaCorpus v2-10k
License	Apache 2.0

What is Zurich-14B-GCv2-10k?

Zurich-14B-GCv2-10k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-10k dataset. This model represents a strategic enhancement aimed at outperforming comparable models in its size category while showcasing the capabilities of the GammaCorpus dataset.

Implementation Details

The model features a sophisticated architecture with 48 layers and employs 40 attention heads for queries and 8 for key-values using Group Query Attention (GQA). The implementation leverages advanced components including Rotary Position Embedding (RoPE), SwiGLU activation functions, and RMSNorm for enhanced performance. The training process was notably efficient, utilizing a single A100 GPU for approximately 10 minutes over 60 epochs using the Unsloth framework.

Advanced transformer architecture with RoPE and SwiGLU
Efficient training implementation using Unsloth framework
Optimized with Group Query Attention mechanism
Built on the robust Qwen 2.5 foundation

Core Capabilities

Advanced language understanding and generation
Optimized for multi-turn conversations
Structured response generation
Bias-mitigated outputs

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its combination of the powerful Qwen 2.5 architecture with the specialized GammaCorpus v2-10k dataset, optimized through efficient training techniques and advanced architectural components like GQA and RoPE.

Q: What are the recommended use cases?

This model is particularly well-suited for applications requiring structured dialogue generation, multi-turn conversations, and general language understanding tasks. It's designed to provide balanced, bias-aware responses while maintaining high performance across various use cases.