yoloe

yoloe

jameslahm

YOLOE is a highly efficient real-time object detection and segmentation model that supports text prompts, visual inputs, and prompt-free paradigms, achieving state-of-the-art performance.

PropertyValue
AuthorAo Wang et al.
PaperarXiv:2503.07465
Model SizesS, M, L variants
Parameters12M-50M

What is YOLOE?

YOLOE (YOLO Eye) is a groundbreaking unified object detection and segmentation model that introduces real-time "seeing anything" capabilities. It uniquely combines efficiency with versatility by supporting multiple prompt mechanisms - text, visual, and prompt-free paradigms - all within a single model architecture.

Implementation Details

The model implements three key innovations: Re-parameterizable Region-Text Alignment (RepRTA) for text prompts, Semantic-Activated Visual Prompt Encoder (SAVPE) for visual prompts, and Lazy Region-Prompt Contrast (LRPC) for prompt-free scenarios. These components enable state-of-the-art performance while maintaining high inference efficiency.

  • Multiple model variants (v8-S/M/L and 11-S/M/L) with different parameter sizes
  • Achieves up to 305.8 FPS on T4 GPU
  • Supports both detection and segmentation tasks
  • Zero-shot capabilities on LVIS dataset

Core Capabilities

  • Real-time object detection and segmentation
  • Multi-prompt support (text, visual, prompt-free)
  • Efficient re-parameterization for transfer learning
  • Superior performance compared to YOLO-Worldv2
  • CoreML and TensorRT deployment support

Frequently Asked Questions

Q: What makes this model unique?

YOLOE's uniqueness lies in its ability to handle multiple prompt types within a single efficient architecture, while achieving real-time performance. It offers 3× less training cost and 1.4× inference speedup compared to similar models.

Q: What are the recommended use cases?

The model is ideal for real-time object detection and segmentation applications, especially in scenarios requiring flexible object recognition without predefined categories. It's particularly suitable for deployment on both GPU (T4) and mobile devices (iPhone).

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026