maxvit_base_tf_512.in21k_ft_in1k

maxvit_base_tf_512.in21k_ft_in1k

timm

MaxViT base model with 120M params, trained on ImageNet-21k and fine-tuned on ImageNet-1k. Achieves 88.2% top-1 accuracy at 512px resolution.

PropertyValue
Parameter Count120M
Model TypeImage Classification
LicenseApache 2.0
Input Resolution512x512
Top-1 Accuracy88.20%
PaperMaxViT: Multi-Axis Vision Transformer

What is maxvit_base_tf_512.in21k_ft_in1k?

This is a MaxViT (Multi-Axis Vision Transformer) base model that combines the strengths of both convolutional neural networks and vision transformers. Initially pretrained on ImageNet-21k and fine-tuned on ImageNet-1k, it represents a significant advancement in vision model architecture by incorporating both local and global processing capabilities.

Implementation Details

The model features a unique architecture that incorporates MBConv blocks with self-attention mechanisms, using both window and grid partitioning schemes. With 120M parameters, it processes images at 512x512 resolution and achieves 88.20% top-1 accuracy on ImageNet-1k.

  • Uniform blocks across stages combining MBConv and self-attention
  • Dual partitioning scheme with window and grid attention
  • 138.02 GMACs computational complexity
  • 703.99M activations

Core Capabilities

  • High-resolution image classification (512x512)
  • Feature extraction capabilities
  • Efficient processing with balanced local-global attention
  • Strong performance on complex visual tasks

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines MBConv blocks with dual-attention mechanisms, allowing it to process both local and global image features effectively. Its pre-training on ImageNet-21k followed by ImageNet-1k fine-tuning gives it robust feature recognition capabilities.

Q: What are the recommended use cases?

The model is particularly well-suited for high-resolution image classification tasks, feature extraction, and as a backbone for downstream computer vision tasks. Its 512x512 resolution makes it ideal for applications requiring detailed image analysis.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026