Zurich-14B-GCv2-500k

Property	Value
Base Model	Qwen 2.5 14B Instruct
Parameter Count	14.7B (13.1B Non-Embedding)
Architecture	Transformer with RoPE, SwiGLU, RMSNorm
Training Dataset	GammaCorpus v2-500k
License	Apache 2.0

What is Zurich-14B-GCv2-500k?

Zurich-14B-GCv2-500k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-500k dataset. This model represents a significant advancement in language model capabilities, combining the robust architecture of Qwen 2.5 with specialized training data.

Implementation Details

The model features a sophisticated architecture with 48 layers and implements Group Query Attention (GQA) with an innovative 40/8 split for query and key-value heads. The training process was notably efficient, completed in approximately 40 minutes using a single A100 GPU through the Unsloth framework, spanning 60 epochs.

Advanced attention mechanism with 40 Q-heads and 8 KV-heads
Implements RoPE (Rotary Position Embedding)
Uses SwiGLU activation and RMSNorm
Optimized with attention QKV bias

Core Capabilities

Enhanced instruction following abilities inherited from Qwen 2.5
Structured conversation handling from GammaCorpus training
Efficient processing with optimized attention mechanisms
Balanced performance across various language tasks

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness stems from its combination of Qwen 2.5's robust architecture with GammaCorpus v2-500k training data, featuring an optimized GQA implementation and efficient training methodology.

Q: What are the recommended use cases?

This model is particularly well-suited for structured conversations, general language understanding tasks, and applications requiring balanced performance between efficiency and capability.