StreetCLIP

Maintained By
geolocal

StreetCLIP

PropertyValue
Base ModelCLIP ViT-Large-Patch14-336
LicenseCC-BY-NC-4.0
PaperarXiv:2302.00275
Training Data1.1M street-level images from 101 countries

What is StreetCLIP?

StreetCLIP is a revolutionary foundation model designed for open-domain image geolocalization and geographic analysis. Built upon OpenAI's CLIP architecture, it has been specifically trained on 1.1 million street-level urban and rural geo-tagged images, enabling state-of-the-art performance in zero-shot geographic classification tasks.

Implementation Details

The model utilizes a ViT architecture with 14x14 pixel patches and 336 pixel input images. It employs a unique synthetic caption pretraining method that enables superior zero-shot learning capabilities in geographic contexts. Training was conducted on 3 NVIDIA A100 GPUs for 3 epochs using AdamW optimizer with a 1e-6 learning rate.

  • Zero-shot classification architecture
  • Domain-specific caption template training
  • Hierarchical linear probing for evaluation
  • Outperforms supervised models trained on millions of images

Core Capabilities

  • Geographic location prediction from street-level imagery
  • Urban and rural scene understanding
  • Building type and quality analysis
  • Infrastructure assessment
  • Environmental monitoring and vegetation mapping

Frequently Asked Questions

Q: What makes this model unique?

StreetCLIP's distinctive feature is its ability to perform zero-shot geographic classification without requiring explicit training on target locations. It achieves this through its innovative synthetic caption pretraining method and comprehensive training on diverse street-level imagery.

Q: What are the recommended use cases?

The model excels in various applications including urban planning, infrastructure assessment, environmental monitoring, and general geographic analysis. It's particularly effective for analyzing building quality, road conditions, vegetation mapping, and natural disaster impact assessment.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.