beit_base_patch16_224.in22k_ft_in22k_in1k

Maintained By
timm

BEiT Base Patch16 224

PropertyValue
Parameter Count86.5M
LicenseApache 2.0
ArchitectureVision Transformer (ViT)
PaperBEiT: BERT Pre-Training of Image Transformers
Image Size224 x 224

What is beit_base_patch16_224.in22k_ft_in22k_in1k?

This is a powerful vision transformer model that implements the BEiT (BERT Pre-training of Image Transformers) architecture. It was initially pre-trained on ImageNet-22k using self-supervised masked image modeling (MIM) with a DALL-E dVAE visual tokenizer, then fine-tuned sequentially on ImageNet-22k and ImageNet-1k datasets.

Implementation Details

The model processes images by dividing them into 16x16 patches and employs a transformer architecture with 86.5M parameters. It operates with 17.6 GMACs and generates 23.9M activations during processing. The architecture follows the proven vision transformer approach while incorporating BERT-style pre-training mechanisms.

  • Pre-trained using masked image modeling on ImageNet-22k
  • Fine-tuned on ImageNet-22k and ImageNet-1k
  • Uses 16x16 pixel patches for image processing
  • Implements DALL-E dVAE as visual tokenizer

Core Capabilities

  • Image Classification
  • Feature Extraction
  • Transfer Learning
  • Visual Representation Learning

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines BERT-style pre-training with vision transformers, using masked image modeling for self-supervised learning. The dual fine-tuning process on both ImageNet-22k and ImageNet-1k datasets provides robust visual understanding capabilities.

Q: What are the recommended use cases?

The model excels in image classification tasks and can be used for feature extraction in downstream computer vision applications. It's particularly well-suited for scenarios requiring rich visual understanding and transfer learning capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.