XCiT Tiny 24 P8 384
Property | Value |
---|---|
Parameter Count | 12.1M |
Image Size | 384 x 384 |
License | Apache-2.0 |
Paper | XCiT: Cross-Covariance Image Transformers |
GMACs | 27.1 |
What is xcit_tiny_24_p8_384.fb_dist_in1k?
This is a specialized implementation of the Cross-Covariance Image Transformer (XCiT) architecture, designed for high-performance image classification tasks. Developed by Facebook Research, this model represents a lightweight variant with 12.1M parameters, optimized for processing 384x384 pixel images. The model has been pre-trained on ImageNet-1k using knowledge distillation techniques to maintain high accuracy while reducing model size.
Implementation Details
The model utilizes the innovative XCiT architecture, which introduces cross-covariance attention mechanisms to process image data efficiently. With 27.1 GMACs and 133.0M activations, it offers a balanced trade-off between computational efficiency and model performance.
- Efficient patch-based image processing with P8 patch size
- 24-layer architecture optimized for 384x384 resolution
- Distillation-based training for improved performance
- Support for both classification and feature extraction tasks
Core Capabilities
- Image classification on ImageNet-1k dataset
- Feature backbone extraction for downstream tasks
- Efficient processing of high-resolution images
- Support for both inference and feature embedding generation
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its implementation of cross-covariance attention mechanisms, which provide efficient processing of high-resolution images while maintaining a relatively small parameter count. The distillation training approach further enhances its performance-to-size ratio.
Q: What are the recommended use cases?
The model is particularly well-suited for image classification tasks requiring high-resolution input (384x384), feature extraction for transfer learning, and scenarios where a balance between model size and performance is crucial. It's ideal for applications requiring both accuracy and computational efficiency.