XCiT Tiny 24 P8 384

Property	Value
Parameter Count	12.1M
Image Size	384 x 384
License	Apache-2.0
Paper	XCiT: Cross-Covariance Image Transformers
GMACs	27.1

What is xcit_tiny_24_p8_384.fb_dist_in1k?

This is a specialized implementation of the Cross-Covariance Image Transformer (XCiT) architecture, designed for high-performance image classification tasks. Developed by Facebook Research, this model represents a lightweight variant with 12.1M parameters, optimized for processing 384x384 pixel images. The model has been pre-trained on ImageNet-1k using knowledge distillation techniques to maintain high accuracy while reducing model size.

Implementation Details

The model utilizes the innovative XCiT architecture, which introduces cross-covariance attention mechanisms to process image data efficiently. With 27.1 GMACs and 133.0M activations, it offers a balanced trade-off between computational efficiency and model performance.

Efficient patch-based image processing with P8 patch size
24-layer architecture optimized for 384x384 resolution
Distillation-based training for improved performance
Support for both classification and feature extraction tasks

Core Capabilities

Image classification on ImageNet-1k dataset
Feature backbone extraction for downstream tasks
Efficient processing of high-resolution images
Support for both inference and feature embedding generation

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its implementation of cross-covariance attention mechanisms, which provide efficient processing of high-resolution images while maintaining a relatively small parameter count. The distillation training approach further enhances its performance-to-size ratio.

Q: What are the recommended use cases?

The model is particularly well-suited for image classification tasks requiring high-resolution input (384x384), feature extraction for transfer learning, and scenarios where a balance between model size and performance is crucial. It's ideal for applications requiring both accuracy and computational efficiency.