35b-beta-long
Property | Value |
---|---|
Parameter Count | 35B |
Tensor Type | BF16 |
License | WTFPL |
Languages | English, Chinese, Japanese, German |
Context Length | 128K |
What is 35b-beta-long?
35b-beta-long is an advanced multilingual language model built upon Cohere's 35B-parameter architecture. This model represents a significant advancement in long-context language processing, featuring extensive training on over 30 million multi-turn dialogue entries. The model utilizes CohereForAI/c4ai-command-r-v01 as its foundation, chosen specifically for its superior responsiveness to high-quality training data.
Implementation Details
The model employs a sophisticated training approach incorporating BF16 precision and a full 128K context window. The training process involved synthesis of data from multiple web-pages and documents, with substantial human oversight to ensure quality. The architecture leverages existing SOTA LLMs combined with human guidance for enhanced information synthesis.
- Trained on 18 diverse datasets including GuanacoDataset, MetaMathQA, and WizardLM
- Implements ChatML template for tokenization
- Features basic safety measures using refusal datasets
- Optimized for long-context performance without specific formatting requirements
Core Capabilities
- Enhanced long-context processing up to 128K tokens
- Reduced hallucination tendency through fact-based training
- Improved mathematical and coding capabilities
- Superior knowledge recall and thematic summarization
- Multi-language support across English, Chinese, Japanese, and German
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its extensive training on synthesized dialogue data and its ability to handle long contexts effectively without compromising performance. It demonstrates capabilities comparable to models twice its size while maintaining high accuracy in information recall and synthesis.
Q: What are the recommended use cases?
The model excels in scenarios requiring long document processing, multi-language support, and complex information synthesis. It's particularly suitable for document analysis, thematic summarization, and general conversational tasks across supported languages.