EfficientViT-SAM

Property	Value
License	Apache 2.0
Paper	View Paper
Tags	Mask Generation, ONNX

What is efficientvit-sam?

EfficientViT-SAM is an innovative advancement in the Segment Anything Model (SAM) framework, designed to deliver high-performance segmentation without compromising accuracy. Developed by MIT-HAN-Lab, it offers multiple variants optimized for different use cases, from lightweight L0 to powerful XL1 models.

Implementation Details

The model comes in five variants (L0, L1, L2, XL0, XL1) supporting resolutions from 512x512 to 1024x1024. The smallest L0 variant contains 34.8M parameters while the largest XL1 variant scales up to 203.3M parameters. Performance metrics show impressive results, with the XL1 variant achieving 47.8 COCO mAP and 44.4 LVIS mAP.

Optimized for NVIDIA platforms including Jetson AGX Orin and A100 GPU
TensorRT implementation with fp16 support
Efficient architecture delivering high throughput (up to 762 images/s for L0)

Core Capabilities

Fast inference times (8.2ms to 37.2ms on Jetson Orin)
Automatic mask generation
Flexible deployment options with ONNX support
Scalable architecture for different performance requirements

Frequently Asked Questions

Q: What makes this model unique?

EfficientViT-SAM stands out for its optimized architecture that maintains high accuracy while significantly reducing inference time. It's particularly notable for achieving state-of-the-art performance on both COCO and LVIS benchmarks while offering various model sizes for different deployment scenarios.

Q: What are the recommended use cases?

The model is ideal for real-time segmentation tasks, particularly in scenarios requiring high throughput or operating under resource constraints. The L0-L2 variants are suitable for edge devices and applications requiring 512x512 resolution, while XL0-XL1 variants are better for higher-resolution (1024x1024) applications requiring maximum accuracy.