rtdetr_r50vd

rtdetr_r50vd

PekingU

Real-time object detection transformer model achieving 53.1% AP on COCO at 108 FPS. Combines DETR accuracy with YOLO-like speed using 43M parameters.

PropertyValue
Parameter Count43M parameters
LicenseApache-2.0
PaperDETRs Beat YOLOs on Real-time Object Detection
Performance53.1% AP on COCO, 108 FPS on T4 GPU

What is rtdetr_r50vd?

RT-DETR (Real-Time Detection Transformer) is a groundbreaking object detection model that bridges the gap between DETR's accuracy and YOLO's speed. Developed by researchers at Peking University, it's the first real-time end-to-end object detector that eliminates the need for Non-Maximum Suppression (NMS) while maintaining high performance.

Implementation Details

The model utilizes a hybrid architecture combining an efficient hybrid encoder with uncertainty-minimal query selection. It processes multi-scale features through two key components: Attention-based Intra-scale Feature Interaction (AIFI) and CNN-based Cross-scale Feature Fusion (CCFF). Images are preprocessed to 640x640 pixels with specific normalization parameters.

  • Trained on COCO 2017 dataset (118k training images)
  • Supports flexible speed tuning through adjustable decoder layers
  • Achieves 53.1% AP on COCO validation set
  • Operates at 108 FPS on T4 GPU

Core Capabilities

  • Real-time object detection with state-of-the-art accuracy
  • End-to-end detection without NMS post-processing
  • Multi-scale feature processing
  • Flexible speed-accuracy trade-off

Frequently Asked Questions

Q: What makes this model unique?

RT-DETR uniquely combines transformer-based detection with real-time performance, outperforming both YOLO models in speed and accuracy while eliminating the need for NMS. It's 21 times faster than DINO-R50 while achieving better accuracy.

Q: What are the recommended use cases?

The model is ideal for real-time object detection applications requiring both speed and accuracy, such as surveillance systems, autonomous driving, and real-time video analysis. Its flexible architecture allows for deployment in various scenarios with different speed-accuracy requirements.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026