XtremeDistil-L12-H384-Uncased

Property	Value
Parameters	33 Million
Speed Improvement	2.7x over BERT-base
License	MIT
Author	Microsoft
Paper	XtremeDistilTransformers Paper

What is xtremedistil-l12-h384-uncased?

XtremeDistil-L12-H384-Uncased is a highly efficient distilled transformer model developed by Microsoft that combines task transfer with multi-task distillation techniques. With 12 layers and a hidden size of 384, it achieves remarkable performance while significantly reducing computational requirements compared to BERT-base.

Implementation Details

The model utilizes advanced distillation techniques from both XtremeDistil and MiniLM approaches, resulting in a powerful yet efficient architecture that achieves 88.5% average score across GLUE benchmarks. It features 12 attention heads and maintains strong performance across various NLP tasks.

Architecture: 12 layers with 384 hidden dimensions
Parameter Reduction: 33M parameters (compared to BERT's 109M)
Performance: 87.2% on MNLI, 91.9% on QNLI, 80.2% on SQuAD2

Core Capabilities

Text Classification tasks
Question Answering (SQuAD2 performance: 80.2%)
Natural Language Understanding
Transfer Learning Applications

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to maintain high performance (88.5% average on GLUE) while achieving a 2.7x speedup over BERT-base, making it particularly suitable for production environments where efficiency is crucial.

Q: What are the recommended use cases?

The model is ideal for general-purpose NLP tasks including text classification, question answering, and natural language understanding, especially in scenarios where computational efficiency is important without compromising significantly on accuracy.