Zurich-14B-GCv2-10k

Maintained By
rubenroy

Zurich-14B-GCv2-10k

PropertyValue
Base ModelQwen 2.5 14B Instruct
Parameter Count14.7B (13.1B Non-Embedding)
ArchitectureTransformers with RoPE, SwiGLU, RMSNorm, and QKV bias
Training DatasetGammaCorpus v2-10k
LicenseApache 2.0

What is Zurich-14B-GCv2-10k?

Zurich-14B-GCv2-10k is an advanced language model that builds upon Alibaba's Qwen 2.5 14B Instruct model, fine-tuned specifically on the GammaCorpus v2-10k dataset. This model represents a strategic enhancement aimed at outperforming comparable models in its size category while showcasing the capabilities of the GammaCorpus dataset.

Implementation Details

The model features a sophisticated architecture with 48 layers and employs 40 attention heads for queries and 8 for key-values using Group Query Attention (GQA). The implementation leverages advanced components including Rotary Position Embedding (RoPE), SwiGLU activation functions, and RMSNorm for enhanced performance. The training process was notably efficient, utilizing a single A100 GPU for approximately 10 minutes over 60 epochs using the Unsloth framework.

  • Advanced transformer architecture with RoPE and SwiGLU
  • Efficient training implementation using Unsloth framework
  • Optimized with Group Query Attention mechanism
  • Built on the robust Qwen 2.5 foundation

Core Capabilities

  • Advanced language understanding and generation
  • Optimized for multi-turn conversations
  • Structured response generation
  • Bias-mitigated outputs

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its combination of the powerful Qwen 2.5 architecture with the specialized GammaCorpus v2-10k dataset, optimized through efficient training techniques and advanced architectural components like GQA and RoPE.

Q: What are the recommended use cases?

This model is particularly well-suited for applications requiring structured dialogue generation, multi-turn conversations, and general language understanding tasks. It's designed to provide balanced, bias-aware responses while maintaining high performance across various use cases.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.