deeplabv3-mobilevit-small

deeplabv3-mobilevit-small

apple

MobileViT + DeepLabV3 small model for semantic segmentation, combining transformer-based vision processing with 6.4M params, achieving 79.1% mIOU on PASCAL VOC.

PropertyValue
Parameters6.4M
ArchitectureMobileViT + DeepLabV3
TaskSemantic Segmentation
DatasetPASCAL VOC
Performance79.1% mIOU
LicenseApple Sample Code License

What is deeplabv3-mobilevit-small?

DeepLabV3 MobileViT Small is an efficient semantic segmentation model that combines the lightweight MobileViT architecture with DeepLabV3 segmentation head. Developed by Apple, it represents a novel approach to mobile-friendly vision transformers, achieving impressive performance while maintaining computational efficiency.

Implementation Details

The model utilizes a hybrid architecture that combines conventional CNN operations with transformer-based processing. Images are processed at 512x512 resolution, with BGR pixel ordering and normalization to [0,1] range. The backbone was pretrained on ImageNet-1k for 300 epochs and then fine-tuned on PASCAL VOC.

  • Multi-scale training from 160x160 to 320x320 resolution
  • Trained on 8 NVIDIA GPUs with 1024 batch size
  • Uses cosine annealing learning rate schedule
  • Implements label smoothing and L2 weight decay

Core Capabilities

  • Efficient semantic segmentation for mobile applications
  • Global processing using transformers combined with local convolution operations
  • No requirement for positional embeddings
  • Easy integration into existing CNN architectures

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines MobileNetV2-style layers with transformer blocks, enabling global processing while maintaining efficiency. It achieves 79.1% mIOU on PASCAL VOC with only 6.4M parameters, making it particularly suitable for mobile applications.

Q: What are the recommended use cases?

The model is ideal for mobile and edge device applications requiring semantic segmentation, such as real-time scene understanding, autonomous systems, and mobile photography applications where computational resources are limited but accuracy is crucial.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026