UltraFastBERT-1x11-long
Property | Value |
---|---|
Parameter Count | 189M |
License | MIT |
Paper | arXiv:2311.10770 |
Tensor Type | F32 |
Training Data | EleutherAI/pile |
What is UltraFastBERT-1x11-long?
UltraFastBERT-1x11-long is a groundbreaking BERT variant that achieves remarkable efficiency by using only 0.3% of its neurons during inference while maintaining performance comparable to traditional BERT models. The model selectively engages just 12 out of 4095 neurons for each layer inference, implementing fast feedforward networks (FFFs) instead of traditional feedforward layers.
Implementation Details
The model introduces a novel approach to neural network efficiency, achieving a 78x speedup on CPU over optimized baseline feedforward implementation. It utilizes fast feedforward networks (FFFs) and demonstrates a 40x speedup over equivalent batched feedforward inference in PyTorch implementation.
- Pretrained on EleutherAI/pile dataset
- Implements selective neuron engagement mechanism
- Achieves 83.0% average score on GLUE benchmark tasks
- Uses MIT license for open development
Core Capabilities
- Masked Language Modeling
- Efficient inference with minimal computational resources
- Strong performance on GLUE tasks (MNLI, QQP, QNLI, SST-2, STS-B, MRPC, RTE)
- Compatible with standard transformer libraries
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to use only 0.3% of its neurons during inference while maintaining BERT-level performance makes it unique. It achieves this through innovative fast feedforward networks, resulting in significant speed improvements.
Q: What are the recommended use cases?
The model is primarily intended for research purposes and fine-tuning on downstream tasks like GLUE. However, it's important to note that this is a raw pretraining checkpoint and is currently untested and unfit for deployment in production environments.