uniformer_image

Maintained By
Sense-X

UniFormer Image Model

PropertyValue
LicenseMIT
PaperUniFormer Paper
Training DataImageNet
Model Size (Small)22M parameters

What is uniformer_image?

UniFormer is an innovative vision transformer that uniquely combines the strengths of convolution and self-attention in a unified transformer architecture. Developed by researchers at Sense-X, it achieves remarkable performance on ImageNet-1K classification with 86.3% top-1 accuracy without requiring additional training data.

Implementation Details

The model implements a hybrid architecture using local MHRA (Multi-Head Relation Aggregation) in shallow layers to reduce computational complexity and global MHRA in deeper layers for learning global token relationships. The architecture comes in different variants, with UniFormer-S containing 22M parameters and UniFormer-B scaling up to 50M parameters.

  • Supports 224x224 resolution image input
  • Implements efficient local-global token mixing
  • Provides multiple model sizes for different computational requirements

Core Capabilities

  • Image Classification (86.3% top-1 accuracy on ImageNet-1K)
  • Video Classification (82.9/84.8% on Kinetics-400/600)
  • Object Detection (53.8% box AP on COCO)
  • Semantic Segmentation (50.8% mIoU on ADE20K)
  • Pose Estimation (77.4% AP on COCO)

Frequently Asked Questions

Q: What makes this model unique?

UniFormer's uniqueness lies in its ability to seamlessly integrate convolution and self-attention mechanisms in a transformer format, achieving state-of-the-art performance across multiple vision tasks without requiring additional training data beyond ImageNet-1K.

Q: What are the recommended use cases?

The model is particularly well-suited for image classification tasks, but its architecture makes it versatile enough for a wide range of computer vision applications, including video classification, object detection, semantic segmentation, and pose estimation.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.