T5-Efficient-Tiny
Property | Value |
---|---|
Parameter Count | 15.58M |
License | Apache 2.0 |
Memory Usage | 62.32 MB (fp32) / 31.16 MB (fp16) |
Paper | Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers |
What is t5-efficient-tiny?
T5-efficient-tiny is a compact variant of Google's T5 model, specifically designed with a deep-narrow architecture that prioritizes model depth over width. This model represents an efficient approach to transformer architecture, containing 15.58 million parameters and optimized for downstream task performance.
Implementation Details
The model follows a deep-narrow architecture with 4 encoder and decoder layers, featuring a hidden dimension of 256 and 4 attention heads. It was pre-trained on the C4 dataset for 524,288 steps using span-based masked language modeling.
- Feed-forward dimension: 1024
- Key/Value dimension: 32
- Attention heads: 4
- Model depth: 4 layers (both encoder and decoder)
Core Capabilities
- Text-to-text generation tasks
- Efficient parameter utilization through deep-narrow architecture
- Optimized for English language tasks
- Suitable for fine-tuning on downstream tasks like summarization and question answering
Frequently Asked Questions
Q: What makes this model unique?
The model's deep-narrow architecture makes it particularly efficient for downstream tasks, providing better performance compared to wider but shallower models of similar parameter count.
Q: What are the recommended use cases?
This model is a pretrained checkpoint that needs to be fine-tuned for specific tasks such as summarization, question answering, or text classification. It's particularly suitable for applications where model efficiency and performance balance is crucial.