DFN5B-CLIP-ViT-H-14

DFN5B-CLIP-ViT-H-14

apple

CLIP model trained on 5B filtered images from 43B pairs, achieving 83.4% ImageNet accuracy. Uses ViT-H-14 architecture with advanced data filtering.

PropertyValue
AuthorApple
LicenseApple Sample Code License
PaperData Filtering Networks
Training Data5B filtered images from 43B pairs
ImageNet Accuracy83.44%

What is DFN5B-CLIP-ViT-H-14?

DFN5B-CLIP-ViT-H-14 is a powerful CLIP model developed by Apple that leverages Data Filtering Networks (DFNs) to automatically curate training data. The model was trained on 5 billion images carefully filtered from a massive pool of 43 billion uncurated image-text pairs, including CommonPool-12.8B and additional public datasets.

Implementation Details

The model implements a ViT-H-14 architecture and has been converted from JAX to PyTorch for wider accessibility. It's designed for contrastive image-text learning and zero-shot image classification, achieving impressive results across multiple benchmarks.

  • Trained on filtered data using DFN technology
  • Supports both image and text encodings
  • Compatible with OpenCLIP framework
  • Achieves 83.44% accuracy on ImageNet-1K

Core Capabilities

  • Zero-shot image classification
  • Contrastive image-text learning
  • High performance on diverse datasets (98.9% on STL-10, 95.7% on Stanford Cars)
  • Robust cross-domain generalization

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its use of Data Filtering Networks (DFNs) to automatically curate training data, resulting in highly refined training on 5B images from a much larger pool. This approach leads to superior performance across various benchmarks while maintaining efficiency.

Q: What are the recommended use cases?

The model excels in zero-shot image classification, visual-semantic understanding, and cross-modal tasks. It's particularly suitable for applications requiring robust image understanding without task-specific training, such as content organization, visual search, and automated tagging systems.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026