XCiT Large 24 P8 224

Property	Value
Parameter Count	188.9M
Model Type	Image Classification
License	Apache-2.0
Paper	XCiT: Cross-Covariance Image Transformers
Image Size	224x224
GMACs	141.2

What is xcit_large_24_p8_224.fb_in1k?

The XCiT Large 24 is a sophisticated Cross-Covariance Image Transformer developed by Facebook Research, specifically designed for high-performance image classification tasks. With its impressive 188.9M parameters, this model represents a significant advancement in vision transformer architecture, utilizing a patch size of 8 pixels and operating on 224x224 resolution images.

Implementation Details

This model implements the Cross-Covariance attention mechanism, which differs from traditional transformer architectures by focusing on feature relationships across spatial locations. The model processes images by dividing them into 8x8 pixel patches and employs 24 transformer layers to extract complex visual features.

Leverages cross-covariance attention for efficient feature extraction
Optimized for 224x224 input resolution
Features 181.6M activations
Implements Facebook's original XCiT architecture

Core Capabilities

High-accuracy image classification on ImageNet-1k dataset
Feature extraction for downstream tasks
Efficient processing of high-resolution images
Support for both classification and embedding generation

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its cross-covariance attention mechanism, which provides efficient processing of visual information while maintaining high accuracy. The large parameter count (188.9M) and specialized architecture make it particularly suitable for complex image classification tasks.

Q: What are the recommended use cases?

The model is best suited for high-stakes image classification tasks, feature extraction for transfer learning, and generating image embeddings for downstream applications. It's particularly effective when working with the ImageNet-1k dataset or similar image classification scenarios.