crossvit_9_240.in1k

Maintained By
timm

CrossViT-9 240 ImageNet Model

PropertyValue
Parameter Count8.55M
Image Size240x240
LicenseApache 2.0
PaperCrossViT Paper
DatasetImageNet-1k

What is crossvit_9_240.in1k?

CrossViT-9 is an innovative vision transformer model that implements a cross-attention multi-scale architecture for image classification. Developed by IBM researchers, this model represents a lightweight implementation with 8.55M parameters, specifically designed to process 240x240 pixel images while maintaining efficient computational requirements of just 1.8 GMACs.

Implementation Details

The model employs a unique dual-branch architecture that processes images at multiple scales simultaneously. It achieves this through cross-attention mechanisms that allow information exchange between different resolution pathways, resulting in more robust feature extraction.

  • Multi-scale processing with cross-attention mechanism
  • Efficient architecture with 8.6M parameters
  • 9.5M activations for feature processing
  • Optimized for 240x240 resolution inputs

Core Capabilities

  • High-quality image classification on ImageNet-1k dataset
  • Feature extraction for downstream tasks
  • Efficient processing with reduced computational overhead
  • Support for both classification and embedding generation

Frequently Asked Questions

Q: What makes this model unique?

CrossViT-9's distinctive feature is its cross-attention mechanism that enables effective multi-scale processing while maintaining a compact parameter count. This makes it particularly efficient for real-world applications where computational resources may be limited.

Q: What are the recommended use cases?

The model is well-suited for image classification tasks, particularly when working with fixed 240x240 resolution images. It can be used for both direct classification and as a feature extractor for transfer learning applications, with support for both full classification and embedding generation workflows.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.