edgenext_small.usi_in1k

timm

EdgeNeXt small model optimized for mobile vision - 5.59M params, ImageNet trained with USI distillation, delivers efficient CNN-Transformer hybrid architecture

Property	Value
Parameter Count	5.59M
Model Type	Image Classification / Feature Backbone
License	MIT
Training Dataset	ImageNet-1k
Architecture	CNN-Transformer Hybrid
Paper	EdgeNeXt Paper

What is edgenext_small.usi_in1k?

EdgeNeXt Small is an efficiently designed hybrid architecture that combines the benefits of CNNs and Transformers, specifically optimized for mobile vision applications. This variant has been trained using the USI (Unified Scheme for Training) methodology on ImageNet-1k, incorporating knowledge distillation techniques to achieve superior performance despite its compact size.

Implementation Details

The model operates with 5.6M parameters and requires only 1.3 GMACs for inference. It processes images at 256x256 resolution during training and 320x320 during testing, maintaining a balance between computational efficiency and accuracy. The architecture features progressive channel dimensions (48→96→160→304) across different stages of the network.

Efficient hybrid architecture combining CNN and Transformer elements
Optimized for mobile vision applications
Trained using advanced USI distillation techniques
Supports feature map extraction at multiple scales

Core Capabilities

Image classification on ImageNet-1k dataset
Feature extraction with multiple output scales
Image embedding generation
Support for both inference and feature backbone usage

Frequently Asked Questions

Q: What makes this model unique?

EdgeNeXt Small stands out for its efficient amalgamation of CNN and Transformer architectures, specifically designed for mobile applications. The USI training methodology and knowledge distillation techniques enable it to achieve competitive performance with just 5.59M parameters.

Q: What are the recommended use cases?

The model is ideal for mobile vision applications requiring efficient image classification, feature extraction, or embedding generation. It's particularly suitable for scenarios where computational resources are limited but high performance is required.