ConvNeXtV2 Atto

Property	Value
Parameters	3.7M
GMACs	0.55
Top-1 Accuracy	76.664%
Image Size	224x224 (train), 288x288 (test)
Paper	ConvNeXt V2 Paper

What is convnextv2_atto.fcmae_ft_in1k?

ConvNeXtV2 Atto is the smallest variant in the ConvNeXtV2 family, designed for efficient image classification. It was pretrained using a fully convolutional masked autoencoder (FCMAE) framework and fine-tuned on ImageNet-1k. This ultra-lightweight model achieves a balance between computational efficiency and reasonable performance.

Implementation Details

The model represents a significant achievement in efficient architecture design, featuring just 3.7M parameters while requiring only 0.55 GMACs for inference. It processes images at 224x224 resolution during training and 288x288 during testing, with memory-efficient activation size of 3.8M.

Fully convolutional architecture optimized for efficiency
FCMAE pretraining for robust feature learning
ImageNet-1k fine-tuning for classification tasks
Efficient processing with minimal computational overhead

Core Capabilities

Image classification with 1000 classes
Feature extraction for downstream tasks
Efficient inference with 4728.91 samples/sec throughput
Balanced trade-off between size and performance

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for being the most lightweight variant in the ConvNeXtV2 family, offering a compelling option for resource-constrained applications while maintaining reasonable accuracy. Its FCMAE pretraining approach helps maintain good feature learning despite the small model size.

Q: What are the recommended use cases?

The model is ideal for mobile and edge devices where computational resources are limited. It's suitable for real-time image classification tasks where moderate accuracy is acceptable, and resource efficiency is paramount. Common applications include mobile apps, IoT devices, and embedded systems.

convnextv2_atto.fcmae_ft_in1k