convnextv2_base.fcmae_ft_in22k_in1k_384

timm

ConvNeXt V2 base model trained with FCMAE, fine-tuned on ImageNet-22k/1k. 88.7M params, 384x384 input, 87.6% top-1 accuracy.

Property	Value
Parameter Count	88.7M
Model Type	Image Classification / Feature Backbone
Input Resolution	384 x 384
Top-1 Accuracy	87.646%
GMACs	45.21
Paper	ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

What is convnextv2_base.fcmae_ft_in22k_in1k_384?

This is a state-of-the-art convolutional neural network that represents the base variant of the ConvNeXt V2 architecture. It was pretrained using a fully convolutional masked autoencoder (FCMAE) framework and subsequently fine-tuned on ImageNet-22k and ImageNet-1k datasets. The model operates on 384x384 pixel images and achieves an impressive 87.646% top-1 accuracy.

Implementation Details

The model features a sophisticated architecture with 88.7M parameters and requires 45.2 GMACs (billion multiply-accumulate operations) for inference. It maintains 84.5M activations during processing and delivers efficient performance with 209.51 samples per second at a batch size of 256.

Advanced FCMAE pretraining methodology
Hierarchical feature extraction capabilities
Optimized for 384x384 resolution inputs
Dual-stage fine-tuning on ImageNet-22k and ImageNet-1k

Core Capabilities

High-accuracy image classification
Feature map extraction at multiple scales
Image embedding generation
Transfer learning applications

Frequently Asked Questions

Q: What makes this model unique?

This model combines the innovative ConvNeXt V2 architecture with FCMAE pretraining, offering an excellent balance between performance (87.646% top-1 accuracy) and efficiency (209.51 samples/sec). It's particularly notable for its ability to process high-resolution 384x384 images while maintaining strong performance.

Q: What are the recommended use cases?

The model excels in image classification tasks, feature extraction, and as a backbone for transfer learning. It's particularly well-suited for applications requiring high-resolution image processing and where both accuracy and efficiency are important considerations.