MambaVision-L3-256-21K

Maintained By
nvidia

MambaVision-L3-256-21K

PropertyValue
Parameters739.6M
FLOPs122.3G
Resolution256x256
Top-1 Accuracy87.3%
LicenseNVIDIA Source Code License-NC
Model HubHugging Face

What is MambaVision-L3-256-21K?

MambaVision-L3-256-21K represents a groundbreaking advancement in computer vision architecture, combining the strengths of Mamba and Transformer models. This hybrid model was developed by NVIDIA to enhance visual feature modeling efficiency while maintaining strong performance characteristics. The model has been pretrained on ImageNet-21K and subsequently fine-tuned on ImageNet-1K, achieving impressive accuracy metrics.

Implementation Details

The model features a hierarchical architecture that integrates both Mamba's efficient processing capabilities and Transformer's self-attention mechanisms. A key innovation is the inclusion of self-attention blocks in the final layers, specifically designed to capture long-range spatial dependencies effectively.

  • Hierarchical architecture with multiple processing stages
  • Integration of self-attention blocks in final layers
  • Support for 256x256 resolution input images
  • Flexible feature extraction capabilities across 4 stages

Core Capabilities

  • Image Classification with 87.3% top-1 accuracy
  • Feature extraction with multiple output stages
  • Support for various input resolutions
  • State-of-the-art performance-throughput trade-off

Frequently Asked Questions

Q: What makes this model unique?

The model's unique hybrid architecture combines Mamba's efficiency with Transformer's ability to capture long-range dependencies, creating a new SOTA Pareto-front in terms of accuracy and throughput. It's the first of its kind to successfully merge these architectures for computer vision tasks.

Q: What are the recommended use cases?

The model excels in both image classification tasks and feature extraction. It's particularly well-suited for applications requiring high accuracy and efficient processing of visual data at 256x256 resolution, with the flexibility to extract features at multiple stages of the processing pipeline.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.