beit-base-patch16-224-pt22k-ft22k

Maintained By
microsoft

BEiT Base Patch16 224 (ImageNet-22k)

PropertyValue
LicenseApache 2.0
PaperBEIT: BERT Pre-Training of Image Transformers
Training DataImageNet-22k (14M images, 21,841 classes)
Input Resolution224x224 pixels

What is beit-base-patch16-224-pt22k-ft22k?

BEiT is a BERT-style vision transformer model that revolutionizes image classification through self-supervised pre-training. This specific variant is pre-trained and fine-tuned on ImageNet-22k, processing images as 16x16 pixel patches at 224x224 resolution. It employs a unique approach using visual tokens from DALL-E's VQ-VAE encoder for masked patch prediction.

Implementation Details

The model architecture follows a transformer encoder design, incorporating several innovative features compared to traditional ViT models. It uses relative position embeddings instead of absolute positions, and performs classification through mean-pooling of patch embeddings rather than using a [CLS] token.

  • Pre-trained on ImageNet-22k with 14 million images
  • Uses 16x16 pixel patches for image processing
  • Implements relative position embeddings similar to T5
  • Normalizes images with mean and std of (0.5, 0.5, 0.5)

Core Capabilities

  • Image classification across 21,841 classes
  • Feature extraction for downstream tasks
  • Self-supervised learning from masked patches
  • Efficient processing of 224x224 resolution images

Frequently Asked Questions

Q: What makes this model unique?

This model stands out through its BERT-style pre-training approach for vision tasks, using masked patch prediction and relative position embeddings, making it particularly effective for transfer learning on image classification tasks.

Q: What are the recommended use cases?

The model is ideal for image classification tasks, feature extraction, and transfer learning applications. It's particularly suitable for scenarios requiring classification among a large number of categories, thanks to its training on ImageNet-22k.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.