QwQ Bakeneko 32B
Property | Value |
---|---|
Parameter Count | 32 Billion |
Architecture | 64-layer, 5120-hidden-size transformer |
License | Apache License 2.0 |
Release Date | March 13, 2025 |
Author | rinna |
What is qwq-bakeneko-32b?
QwQ Bakeneko 32B is an advanced Japanese language model that represents a significant evolution in instruction-tuned reasoning capabilities. Built upon the rinna/qwen2.5-bakeneko-32b foundation, this model incorporates innovative Chat Vector technology and Odds Ratio Preference Optimization (ORPO) to deliver enhanced performance in Japanese language tasks.
Implementation Details
The model employs a sophisticated multi-stage training process that combines model merging and distillation techniques. The base model is enhanced through a Chat Vector addition process, calculated by subtracting parameter vectors of Qwen/QwQ-32B from Qwen/Qwen2.5-32B. Further refinement occurs through ORPO training on 1.3k carefully curated data samples generated by DeepSeek-R1.
- Advanced 64-layer transformer architecture with 5120 hidden size
- Innovative Chat Vector technology for improved instruction following
- ORPO optimization for enhanced reasoning capabilities
- Comprehensive benchmarking showing superior performance in Japanese tasks
Core Capabilities
- Superior performance in Japanese language tasks with 78.31 score on Japanese LM Evaluation Harness
- Enhanced multi-turn conversation abilities (8.52 on Japanese MT-Bench)
- Improved instruction following through Chat Vector technology
- Advanced reasoning capabilities through DeepSeek-R1 distillation
Frequently Asked Questions
Q: What makes this model unique?
The model combines Chat Vector technology with ORPO optimization, creating a unique approach to Japanese language processing that outperforms previous models in both single-turn and multi-turn conversations. Its architecture specifically targets Japanese language understanding while maintaining strong reasoning capabilities.
Q: What are the recommended use cases?
The model excels in Japanese language tasks, particularly in scenarios requiring complex reasoning and multi-turn conversations. It's well-suited for applications requiring sophisticated Japanese language understanding, instruction following, and detailed reasoning capabilities.