RWKV-6 World
Property | Value |
---|---|
License | Apache 2.0 |
Research Paper | arXiv:2404.05892 |
Supported Languages | 12 (including English, Chinese, French, Spanish, etc.) |
Training Data Size | 1.42T tokens (v2.1) |
What is rwkv-6-world?
RWKV-6 World is an advanced language model trained on a diverse collection of datasets including SlimPajama, The Pile, StarCoder, and OSCAR. It represents a significant evolution in the RWKV architecture, achieving impressive performance with 7B parameters and scoring 54.2% on MMLU benchmarks.
Implementation Details
The model is implemented using PyTorch and requires the rwkv pip package (version 0.8.24+) for inference. It features a specialized training composition of 70% English, 15% multilingual content, and 15% code, making it particularly versatile for various applications.
- Trained on multiple high-quality datasets including Wikipedia and ChatGPT data
- Supports 12 different languages including major Asian and European languages
- Implements specialized tokenizer (rwkv_vocab_v20230424)
- Optimized for both chat and QA tasks
Core Capabilities
- Multi-language text generation and understanding
- Code generation and analysis
- Question-answering capabilities
- Chat-based interactions with formatted prompting
- High MMLU performance (54.2% for 7B v3 model)
Frequently Asked Questions
Q: What makes this model unique?
RWKV-6 World stands out for its efficient architecture that combines the benefits of transformers with RNN-like characteristics, while supporting an impressive range of 12 languages and achieving strong performance on benchmark tests.
Q: What are the recommended use cases?
The model is well-suited for chat applications, question-answering systems, and code generation tasks. It can be effectively used with specific prompt formats for chat and QA scenarios, making it versatile for both general conversation and specialized tasks.