XCiT Large 24 P8 224
Property | Value |
---|---|
Parameter Count | 188.9M |
Model Type | Image Classification |
License | Apache-2.0 |
Paper | XCiT: Cross-Covariance Image Transformers |
Image Size | 224x224 |
GMACs | 141.2 |
What is xcit_large_24_p8_224.fb_in1k?
The XCiT Large 24 is a sophisticated Cross-Covariance Image Transformer developed by Facebook Research, specifically designed for high-performance image classification tasks. With its impressive 188.9M parameters, this model represents a significant advancement in vision transformer architecture, utilizing a patch size of 8 pixels and operating on 224x224 resolution images.
Implementation Details
This model implements the Cross-Covariance attention mechanism, which differs from traditional transformer architectures by focusing on feature relationships across spatial locations. The model processes images by dividing them into 8x8 pixel patches and employs 24 transformer layers to extract complex visual features.
- Leverages cross-covariance attention for efficient feature extraction
- Optimized for 224x224 input resolution
- Features 181.6M activations
- Implements Facebook's original XCiT architecture
Core Capabilities
- High-accuracy image classification on ImageNet-1k dataset
- Feature extraction for downstream tasks
- Efficient processing of high-resolution images
- Support for both classification and embedding generation
Frequently Asked Questions
Q: What makes this model unique?
This model's uniqueness lies in its cross-covariance attention mechanism, which provides efficient processing of visual information while maintaining high accuracy. The large parameter count (188.9M) and specialized architecture make it particularly suitable for complex image classification tasks.
Q: What are the recommended use cases?
The model is best suited for high-stakes image classification tasks, feature extraction for transfer learning, and generating image embeddings for downstream applications. It's particularly effective when working with the ImageNet-1k dataset or similar image classification scenarios.