fairseq-dense-6.7B

Property	Value
Parameter Count	6.7 Billion
Author	KoboldAI
Original Source	Facebook AI Research (FAIR)
Paper	Efficient Large Scale Language Modeling with Mixtures of Experts

What is fairseq-dense-6.7B?

fairseq-dense-6.7B is a Hugging Face transformers-compatible conversion of Facebook's original dense 6.7B-parameter language model. This model represents a significant achievement in efficient large-scale language modeling, offering robust performance across various natural language processing tasks.

Implementation Details

The model has been specifically converted to work with the Hugging Face transformers library, making it more accessible to the broader AI community. It maintains the dense architecture of the original Facebook model while providing compatibility with modern NLP pipelines.

Architecture: Dense transformer-based model with 6.7B parameters
Framework: Hugging Face transformers-compatible
Original implementation: Based on Facebook's fairseq framework

Core Capabilities

HellaSwag (10-shot): 71.26% accuracy
Winogrande (5-shot): 65.27% accuracy
ARC (25-shot): 39.42% accuracy
TruthfulQA (0-shot): 32.73% accuracy
MMLU (5-shot): 26.91% accuracy
DROP (3-shot): 17.05% accuracy

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its efficient dense architecture and strong performance on complex language understanding tasks, particularly excelling in few-shot learning scenarios for tasks like HellaSwag and Winogrande.

Q: What are the recommended use cases?

Based on its performance metrics, the model is well-suited for tasks requiring commonsense reasoning and natural language understanding, particularly in scenarios where few-shot learning is needed. It shows strong capabilities in multiple-choice question answering and linguistic understanding tasks.