detr-resnet-50-panoptic

facebook

DETR-ResNet-50 panoptic segmentation model using transformers. Achieves 38.8% box AP and 43.4% PQ on COCO. Developed by Facebook for end-to-end object detection.

Property	Value
Author	Facebook
License	Apache 2.0
Paper	End-to-End Object Detection with Transformers
Training Data	COCO 2017 Panoptic (118k images)

What is detr-resnet-50-panoptic?

DETR-ResNet-50-panoptic is an innovative end-to-end object detection model that combines transformer architecture with a ResNet-50 backbone for panoptic segmentation tasks. Developed by Facebook Research, it represents a breakthrough in computer vision by eliminating the need for many hand-crafted components while maintaining strong performance.

Implementation Details

The model utilizes an encoder-decoder transformer architecture with a convolutional backbone. It features two specialized heads: a linear layer for class labels and an MLP for bounding box prediction. The model processes 100 object queries simultaneously, using bipartite matching loss and Hungarian algorithm for optimization.

ResNet-50 backbone architecture
Transformer-based encoder-decoder structure
Trained on 16 V100 GPUs for 300 epochs
Achieves 38.8% box AP and 43.4% PQ on COCO

Core Capabilities

Panoptic segmentation of images
End-to-end object detection
Class label prediction
Bounding box generation
Mask generation for segmentation

Frequently Asked Questions

Q: What makes this model unique?

This model is unique in its end-to-end approach to object detection using transformers, eliminating the need for hand-crafted components like non-maximum suppression. It's also naturally extensible to panoptic segmentation tasks.

Q: What are the recommended use cases?

The model is ideal for complex scene understanding tasks requiring both instance and semantic segmentation, such as autonomous driving, robotics, and image analysis systems requiring detailed scene comprehension.