Zurich-14B-GCv2-50k

Property	Value
Parameter Count	14.7B (13.1B Non-Embedding)
Base Model	Qwen 2.5 14B Instruct
Architecture	Transformers with RoPE, SwiGLU, RMSNorm, and QKV bias
Training Dataset	GammaCorpus v2-50k
License	Apache 2.0

What is Zurich-14B-GCv2-50k?

Zurich-14B-GCv2-50k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-50k dataset. This model represents a significant advancement in structured dialogue generation, featuring 48 layers and 40 attention heads for queries with 8 for key-values.

Implementation Details

The model underwent a focused fine-tuning process utilizing a single A100 GPU for approximately 20 minutes, completing 60 epochs of training using the Unsloth framework. The architecture implements several sophisticated components including Rotary Position Embedding (RoPE), SwiGLU activations, and RMSNorm, combined with attention QKV bias for enhanced performance.

48 transformer layers with Group Query Attention (GQA)
Optimized attention mechanism with 40/8 head split for Q and KV
Built on the robust Qwen 2.5 architecture
Trained on structured and filtered multi-turn conversations

Core Capabilities

Advanced dialogue generation with structured outputs
Enhanced context understanding through GammaCorpus training
Efficient performance with optimized attention mechanisms
Robust text generation with bias mitigation features

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness stems from its fine-tuning on the carefully curated GammaCorpus v2-50k dataset, combined with its sophisticated architecture that leverages Group Query Attention and advanced positioning embeddings. This combination allows for more efficient and accurate text generation while maintaining computational efficiency.

Q: What are the recommended use cases?

The model excels in structured dialogue generation, making it particularly suitable for conversational AI applications, customer service automation, and general text generation tasks. Its training on filtered multi-turn conversations makes it especially effective for maintaining context in extended dialogues.