Zurich-14B-GCv2-50k

Maintained By
rubenroy

Zurich-14B-GCv2-50k

PropertyValue
Parameter Count14.7B (13.1B Non-Embedding)
Base ModelQwen 2.5 14B Instruct
ArchitectureTransformers with RoPE, SwiGLU, RMSNorm, and QKV bias
Training DatasetGammaCorpus v2-50k
LicenseApache 2.0

What is Zurich-14B-GCv2-50k?

Zurich-14B-GCv2-50k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-50k dataset. This model represents a significant advancement in structured dialogue generation, featuring 48 layers and 40 attention heads for queries with 8 for key-values.

Implementation Details

The model underwent a focused fine-tuning process utilizing a single A100 GPU for approximately 20 minutes, completing 60 epochs of training using the Unsloth framework. The architecture implements several sophisticated components including Rotary Position Embedding (RoPE), SwiGLU activations, and RMSNorm, combined with attention QKV bias for enhanced performance.

  • 48 transformer layers with Group Query Attention (GQA)
  • Optimized attention mechanism with 40/8 head split for Q and KV
  • Built on the robust Qwen 2.5 architecture
  • Trained on structured and filtered multi-turn conversations

Core Capabilities

  • Advanced dialogue generation with structured outputs
  • Enhanced context understanding through GammaCorpus training
  • Efficient performance with optimized attention mechanisms
  • Robust text generation with bias mitigation features

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness stems from its fine-tuning on the carefully curated GammaCorpus v2-50k dataset, combined with its sophisticated architecture that leverages Group Query Attention and advanced positioning embeddings. This combination allows for more efficient and accurate text generation while maintaining computational efficiency.

Q: What are the recommended use cases?

The model excels in structured dialogue generation, making it particularly suitable for conversational AI applications, customer service automation, and general text generation tasks. Its training on filtered multi-turn conversations makes it especially effective for maintaining context in extended dialogues.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.