SqueezeBERT-MNLI

Property	Value
License	BSD
Paper	SqueezeBERT Paper
Training Data	BookCorpus, Wikipedia, MNLI
Primary Language	English

What is squeezebert-mnli?

SqueezeBERT-MNLI is an efficient transformer model that revolutionizes natural language processing by replacing traditional fully-connected layers with grouped convolutions. This model has been specifically pretrained on BookCorpus and Wikipedia, then finetuned on the Multi-Genre Natural Language Inference (MNLI) dataset. Most notably, it achieves 4.3x faster performance than BERT-base-uncased on mobile devices like the Google Pixel 3.

Implementation Details

The model was pretrained using the LAMB optimizer with specific hyperparameters: a global batch size of 8192, learning rate of 2.5e-3, and warmup proportion of 0.28. The training process involved 56,000 steps with a sequence length of 128, followed by 6,000 steps with a sequence length of 512. The architecture maintains BERT-base's structure but innovates with grouped convolutions for improved efficiency.

Case-insensitive processing
Trained using MLM and SOP objectives
Implements "bells and whistles" finetuning approach with MNLI
Optimized for mobile deployment

Core Capabilities

Natural Language Inference tasks
Efficient mobile deployment
Text classification
Sequence understanding

Frequently Asked Questions

Q: What makes this model unique?

SqueezeBERT-MNLI's primary distinction is its use of grouped convolutions instead of traditional fully-connected layers, resulting in significantly faster performance on mobile devices while maintaining competitive accuracy.

Q: What are the recommended use cases?

The model is particularly well-suited for mobile applications requiring natural language inference, text classification, and general language understanding tasks where computational efficiency is crucial.