fairseq-dense-6.7B
Property | Value |
---|---|
Parameter Count | 6.7 Billion |
Author | KoboldAI |
Original Source | Facebook AI Research (FAIR) |
Paper | Efficient Large Scale Language Modeling with Mixtures of Experts |
What is fairseq-dense-6.7B?
fairseq-dense-6.7B is a Hugging Face transformers-compatible conversion of Facebook's original dense 6.7B-parameter language model. This model represents a significant achievement in efficient large-scale language modeling, offering robust performance across various natural language processing tasks.
Implementation Details
The model has been specifically converted to work with the Hugging Face transformers library, making it more accessible to the broader AI community. It maintains the dense architecture of the original Facebook model while providing compatibility with modern NLP pipelines.
- Architecture: Dense transformer-based model with 6.7B parameters
- Framework: Hugging Face transformers-compatible
- Original implementation: Based on Facebook's fairseq framework
Core Capabilities
- HellaSwag (10-shot): 71.26% accuracy
- Winogrande (5-shot): 65.27% accuracy
- ARC (25-shot): 39.42% accuracy
- TruthfulQA (0-shot): 32.73% accuracy
- MMLU (5-shot): 26.91% accuracy
- DROP (3-shot): 17.05% accuracy
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its efficient dense architecture and strong performance on complex language understanding tasks, particularly excelling in few-shot learning scenarios for tasks like HellaSwag and Winogrande.
Q: What are the recommended use cases?
Based on its performance metrics, the model is well-suited for tasks requiring commonsense reasoning and natural language understanding, particularly in scenarios where few-shot learning is needed. It shows strong capabilities in multiple-choice question answering and linguistic understanding tasks.