squeezebert-uncased

Maintained By
squeezebert

SqueezeBERT-Uncased

PropertyValue
LicenseBSD
PaperSqueezeBERT Paper
Training DataBookCorpus, Wikipedia
ArchitectureBERT-based with grouped convolutions

What is squeezebert-uncased?

SqueezeBERT-uncased is an innovative transformer model that optimizes BERT's architecture for mobile devices. It maintains BERT's core functionality while replacing traditional fully-connected layers with grouped convolutions, resulting in significantly faster inference times - specifically 4.3x faster than bert-base-uncased on a Google Pixel 3 smartphone.

Implementation Details

The model is pretrained using the LAMB optimizer with specific hyperparameters: a global batch size of 8192, learning rate of 2.5e-3, and warmup proportion of 0.28. The training process involves 56k steps with a maximum sequence length of 128, followed by 6k steps with a sequence length of 512. The model uses Masked Language Modeling (MLM) and Sentence Order Prediction (SOP) objectives during pretraining.

  • Case-insensitive tokenization
  • Efficient grouped convolution architecture
  • Optimized for mobile deployment
  • No distillation used in pretraining

Core Capabilities

  • Fast inference on mobile devices
  • Text classification tasks
  • Masked language modeling
  • Sentence order prediction

Frequently Asked Questions

Q: What makes this model unique?

SqueezeBERT's main innovation is its use of grouped convolutions instead of fully-connected layers, making it significantly more efficient for mobile deployment while maintaining BERT-like performance.

Q: What are the recommended use cases?

The model is particularly well-suited for mobile applications requiring BERT-like capabilities. For text classification tasks, it's recommended to use the squeezebert-mnli-headless variant as a starting point.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.