MiniLM-L12-H384-uncased

MiniLM-L12-H384-uncased

microsoft

MiniLM is a compact 33M-parameter language model that achieves BERT-level performance while being 2.7x faster. Ideal for efficient NLP tasks.

PropertyValue
Parameters33M
LicenseMIT
AuthorMicrosoft
PaperView Paper
Architecture12-layer, 384-hidden, 12-heads

What is MiniLM-L12-H384-uncased?

MiniLM is a compressed transformer model developed by Microsoft that achieves remarkable efficiency while maintaining high performance. This uncased version features 12 layers with a 384 hidden size, resulting in just 33M parameters - a significant reduction compared to BERT-Base's 109M parameters while being 2.7x faster.

Implementation Details

The model utilizes deep self-attention distillation to compress pre-trained transformers while preserving their task-agnostic capabilities. It's designed as a drop-in replacement for BERT, requiring fine-tuning before deployment.

  • Achieves comparable or better performance than BERT-Base on various NLP tasks
  • Features improved efficiency with 33M parameters (vs BERT's 109M)
  • Implements a 12-layer architecture with 384 hidden dimensions
  • Supports uncased text processing

Core Capabilities

  • Strong performance on SQuAD 2.0 (81.7 vs BERT's 76.8)
  • Excellent MNLI-m accuracy (85.7)
  • High performance on SST-2 (93.0) and QNLI (91.5)
  • Effective on MRPC (89.5) and QQP (91.3) tasks

Frequently Asked Questions

Q: What makes this model unique?

MiniLM's uniqueness lies in its ability to maintain BERT-level performance while significantly reducing model size through deep self-attention distillation, making it 2.7x faster than BERT-Base.

Q: What are the recommended use cases?

The model is particularly well-suited for text classification tasks, question answering, and general NLP applications where computational efficiency is crucial while maintaining high accuracy.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026