crossvit_9_240.in1k

crossvit_9_240.in1k

timm

CrossViT-9 is a compact vision transformer (8.55M params) optimized for 240x240 images, utilizing cross-attention for multi-scale feature learning.

PropertyValue
Parameter Count8.55M
Image Size240x240
LicenseApache 2.0
PaperCrossViT Paper
DatasetImageNet-1k

What is crossvit_9_240.in1k?

CrossViT-9 is an innovative vision transformer model that implements a cross-attention multi-scale architecture for image classification. Developed by IBM researchers, this model represents a lightweight implementation with 8.55M parameters, specifically designed to process 240x240 pixel images while maintaining efficient computational requirements of just 1.8 GMACs.

Implementation Details

The model employs a unique dual-branch architecture that processes images at multiple scales simultaneously. It achieves this through cross-attention mechanisms that allow information exchange between different resolution pathways, resulting in more robust feature extraction.

  • Multi-scale processing with cross-attention mechanism
  • Efficient architecture with 8.6M parameters
  • 9.5M activations for feature processing
  • Optimized for 240x240 resolution inputs

Core Capabilities

  • High-quality image classification on ImageNet-1k dataset
  • Feature extraction for downstream tasks
  • Efficient processing with reduced computational overhead
  • Support for both classification and embedding generation

Frequently Asked Questions

Q: What makes this model unique?

CrossViT-9's distinctive feature is its cross-attention mechanism that enables effective multi-scale processing while maintaining a compact parameter count. This makes it particularly efficient for real-world applications where computational resources may be limited.

Q: What are the recommended use cases?

The model is well-suited for image classification tasks, particularly when working with fixed 240x240 resolution images. It can be used for both direct classification and as a feature extractor for transfer learning applications, with support for both full classification and embedding generation workflows.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026