Qwerky-72B

featherless-ai

Qwerky-72B is a RWKV-based linear attention model converted from Qwen 2.5 72B, offering 1000x improved inference costs while maintaining competitive performance.

Property	Value
Parameter Count	72 Billion
Model Type	RWKV Linear Attention
Base Model	Qwen 2.5 72B
Model URL	Hugging Face
Author	featherless-ai

What is Qwerky-72B?

Qwerky-72B represents a significant advancement in efficient large language models, successfully converting Qwen 2.5 72B into a RWKV variant. This conversion enables a remarkable >1000x improvement in inference costs while maintaining competitive performance across various benchmarks.

Implementation Details

The model utilizes linear attention mechanisms through the RWKV architecture, enabling significant computational efficiency gains without compromising performance. Notable is the fact that this conversion was achieved without requiring pretraining or rebuilding the model from scratch.

Supports approximately 30 languages inherited from the Qwen model family
Demonstrates improved performance over the preview model iteration
Achieves comparable or better results than the original Qwen model in several benchmarks

Core Capabilities

Strong performance in ARC Challenge (63.82%) and ARC Easy (84.43%)
Exceptional accuracy in SciQ tasks (96.70%)
Improved Winogrande performance (79.56%) compared to base model
Competitive MMLU scores (77.46%)

Frequently Asked Questions

Q: What makes this model unique?

The model's unique value proposition lies in its successful conversion to RWKV architecture, enabling dramatic inference cost reductions while maintaining high performance. This makes AI more accessible and practical for wider deployment.

Q: What are the recommended use cases?

Given its efficient architecture and strong benchmark performance, the model is particularly well-suited for applications requiring fast inference times while maintaining high accuracy, especially in tasks involving reasoning, science questions, and multi-language processing within its supported language set.