Marqo-FashionCLIP
Property | Value |
---|---|
Base Model | ViT-B-16 (laion2b_s34b_b88k) |
Author | Marqo |
Framework Support | Hugging Face, OpenCLIP, Transformers.js |
Model URL | Hugging Face |
What is marqo-fashionCLIP?
Marqo-FashionCLIP is a state-of-the-art fashion-oriented CLIP model that leverages Generalised Contrastive Learning (GCL) to provide highly accurate fashion product search and classification capabilities. The model has been specifically designed to understand not just text descriptions, but also categories, styles, colors, materials, keywords, and fine details of fashion items.
Implementation Details
The model is built upon the ViT-B-16 architecture and has been fine-tuned using the LAION-2B dataset. It implements advanced features that allow for multi-modal understanding of fashion items, demonstrating superior performance across various benchmarks compared to existing fashion CLIP models.
- Supports multiple integration methods including Hugging Face, OpenCLIP, and Transformers.js
- Employs Generalised Contrastive Learning for enhanced feature understanding
- Provides comprehensive fashion-specific embeddings
Core Capabilities
- Text-to-Image search with 0.192 average recall across 6 datasets
- Category-to-Product matching with 0.705 average precision
- Sub-Category-to-Product classification with 0.707 average precision
- Support for multiple programming frameworks and environments
Frequently Asked Questions
Q: What makes this model unique?
The model's unique strength lies in its ability to outperform previous state-of-the-art fashion CLIP models across multiple benchmarks. It achieves this through its specialized training approach using GCL and comprehensive understanding of fashion-specific attributes.
Q: What are the recommended use cases?
The model is ideal for e-commerce platforms, fashion retailers, and applications requiring precise fashion item classification, search, and recommendation systems. It excels in tasks like category matching, product search, and style analysis.