maxvit_nano_rw_256.sw_in1k

maxvit_nano_rw_256.sw_in1k

timm

Lightweight MaxViT model variant with 15.5M parameters optimized for 256x256 images, achieving 82.93% top-1 accuracy on ImageNet-1k

PropertyValue
Parameter Count15.45M
Top-1 Accuracy82.93%
Image Size256x256
LicenseApache 2.0
PaperMaxViT: Multi-Axis Vision Transformer

What is maxvit_nano_rw_256.sw_in1k?

maxvit_nano_rw_256.sw_in1k is a lightweight variant of the MaxViT architecture, specifically optimized for 256x256 resolution images. It implements a hybrid approach combining convolutional neural networks and transformer architectures, achieving an impressive balance between model size (15.45M parameters) and performance (82.93% top-1 accuracy on ImageNet-1k).

Implementation Details

The model utilizes a multi-axis attention mechanism that combines both local and global feature processing. It's built on the MaxViT architecture which incorporates:

  • MBConv (depthwise-separable) convolution blocks
  • Dual self-attention mechanisms with window and grid partitioning
  • Optimized for PyTorch with RW (Ross Wightman) specific configurations
  • 4.46 GMACs computational complexity
  • 30.28M activations

Core Capabilities

  • Image Classification on ImageNet-1k dataset
  • Feature extraction with multiple resolution outputs
  • Efficient processing with 1,218.17 samples/sec throughput
  • Balanced performance for edge deployment scenarios

Frequently Asked Questions

Q: What makes this model unique?

This model represents an optimal trade-off between model size and performance, specifically designed for scenarios requiring efficient inference on 256x256 images. Its unique multi-axis attention mechanism allows it to capture both local and global features effectively while maintaining a relatively small parameter count.

Q: What are the recommended use cases?

The model is well-suited for: 1) Resource-constrained environments requiring decent classification performance, 2) Real-time image classification tasks, 3) Feature extraction for downstream computer vision tasks, and 4) Scenarios where 256x256 resolution is sufficient for the application needs.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026