xtremedistil-l12-h384-uncased

Maintained By
microsoft

XtremeDistil-L12-H384-Uncased

PropertyValue
Parameters33 Million
Speed Improvement2.7x over BERT-base
LicenseMIT
AuthorMicrosoft
PaperXtremeDistilTransformers Paper

What is xtremedistil-l12-h384-uncased?

XtremeDistil-L12-H384-Uncased is a highly efficient distilled transformer model developed by Microsoft that combines task transfer with multi-task distillation techniques. With 12 layers and a hidden size of 384, it achieves remarkable performance while significantly reducing computational requirements compared to BERT-base.

Implementation Details

The model utilizes advanced distillation techniques from both XtremeDistil and MiniLM approaches, resulting in a powerful yet efficient architecture that achieves 88.5% average score across GLUE benchmarks. It features 12 attention heads and maintains strong performance across various NLP tasks.

  • Architecture: 12 layers with 384 hidden dimensions
  • Parameter Reduction: 33M parameters (compared to BERT's 109M)
  • Performance: 87.2% on MNLI, 91.9% on QNLI, 80.2% on SQuAD2

Core Capabilities

  • Text Classification tasks
  • Question Answering (SQuAD2 performance: 80.2%)
  • Natural Language Understanding
  • Transfer Learning Applications

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to maintain high performance (88.5% average on GLUE) while achieving a 2.7x speedup over BERT-base, making it particularly suitable for production environments where efficiency is crucial.

Q: What are the recommended use cases?

The model is ideal for general-purpose NLP tasks including text classification, question answering, and natural language understanding, especially in scenarios where computational efficiency is important without compromising significantly on accuracy.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.