InternViT-300M-448px

Maintained By
OpenGVLab

InternViT-300M-448px

PropertyValue
Parameter Count304M
LicenseMIT
Image Size448x448
Tensor TypeBF16
PaperView Paper

What is InternViT-300M-448px?

InternViT-300M-448px is a compact vision foundation model developed by OpenGVLab through knowledge distillation from the larger InternViT-6B-448px-V1-5. This efficient model maintains powerful capabilities while significantly reducing the parameter count to just 304M, making it more accessible for various applications.

Implementation Details

The model operates on a dynamic input resolution of 448×448 pixels and supports processing of 1 to 12 tiles during training, expanding to 1-40 tiles during inference. It's built using transformer architecture and optimized for BF16 precision, offering an excellent balance between performance and resource utilization.

  • Pretrained on multiple datasets including LAION-en, LAION-zh, COYO, and specialized OCR datasets
  • Supports dynamic multi-tile processing for handling various image sizes
  • Implements efficient knowledge distillation techniques
  • Optimized for bfloat16 precision

Core Capabilities

  • High-quality image feature extraction
  • Robust OCR capabilities inherited from parent model
  • Efficient processing of high-resolution images
  • Flexible tile-based processing system

Frequently Asked Questions

Q: What makes this model unique?

This model uniquely combines efficiency with powerful capabilities, offering the robust features of larger models while maintaining a relatively small 304M parameter count. Its ability to handle dynamic resolutions and multiple tiles makes it versatile for various vision tasks.

Q: What are the recommended use cases?

The model excels in image feature extraction tasks, OCR applications, and scenarios requiring high-resolution image processing. It's particularly suitable for applications where computational resources are limited but high-quality vision capabilities are needed.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.