xtremedistil-l6-h256-uncased

xtremedistil-l6-h256-uncased

microsoft

A lightweight BERT-based model with 6 layers and 256 hidden dimensions, achieving 8.7x speedup over BERT-base while maintaining strong performance on NLP tasks.

PropertyValue
Parameters13 million
LicenseMIT
PaperXtremeDistilTransformers Paper
AuthorMicrosoft

What is xtremedistil-l6-h256-uncased?

XtremeDistil-L6-H256-Uncased is a highly efficient distilled transformer model developed by Microsoft that achieves remarkable performance while being significantly smaller than BERT-base. With just 6 layers and 256 hidden dimensions, it achieves an impressive 8.7x speedup while maintaining competitive performance across various NLP tasks.

Implementation Details

The model implements an innovative approach to knowledge distillation, combining task transfer with multi-task distillation techniques. It features 6 transformer layers, 256 hidden dimensions, and requires only 13 million parameters - a fraction of BERT-base's 109 million parameters.

  • Architecture: 6 transformer layers
  • Hidden size: 256 dimensions
  • Speed improvement: 8.7x faster than BERT-base
  • Framework compatibility: TensorFlow 2.3.1, PyTorch 1.6.0

Core Capabilities

  • Strong performance on GLUE benchmark tasks
  • Excellent results on SQUAD2 question answering
  • Task-agnostic architecture suitable for transfer learning
  • Efficient inference with minimal computational requirements

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional efficiency-to-performance ratio, achieving 85.6% average score across major NLP benchmarks while being 8.7x faster than BERT-base. It's particularly notable for maintaining high performance despite significant parameter reduction.

Q: What are the recommended use cases?

The model is ideal for applications requiring efficient NLP processing, including text classification, question answering, and general language understanding tasks where computational resources are limited but high performance is still needed.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026