XtremeDistil-L12-H384-Uncased
Property | Value |
---|---|
Parameters | 33 Million |
Speed Improvement | 2.7x over BERT-base |
License | MIT |
Author | Microsoft |
Paper | XtremeDistilTransformers Paper |
What is xtremedistil-l12-h384-uncased?
XtremeDistil-L12-H384-Uncased is a highly efficient distilled transformer model developed by Microsoft that combines task transfer with multi-task distillation techniques. With 12 layers and a hidden size of 384, it achieves remarkable performance while significantly reducing computational requirements compared to BERT-base.
Implementation Details
The model utilizes advanced distillation techniques from both XtremeDistil and MiniLM approaches, resulting in a powerful yet efficient architecture that achieves 88.5% average score across GLUE benchmarks. It features 12 attention heads and maintains strong performance across various NLP tasks.
- Architecture: 12 layers with 384 hidden dimensions
- Parameter Reduction: 33M parameters (compared to BERT's 109M)
- Performance: 87.2% on MNLI, 91.9% on QNLI, 80.2% on SQuAD2
Core Capabilities
- Text Classification tasks
- Question Answering (SQuAD2 performance: 80.2%)
- Natural Language Understanding
- Transfer Learning Applications
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its ability to maintain high performance (88.5% average on GLUE) while achieving a 2.7x speedup over BERT-base, making it particularly suitable for production environments where efficiency is crucial.
Q: What are the recommended use cases?
The model is ideal for general-purpose NLP tasks including text classification, question answering, and natural language understanding, especially in scenarios where computational efficiency is important without compromising significantly on accuracy.