detr-resnet-101

detr-resnet-101

facebook

DETR-ResNet-101 is a 60.7M parameter transformer-based object detection model that achieves 43.5 AP on COCO, combining CNN and attention mechanisms for end-to-end detection.

PropertyValue
Parameters60.7M
LicenseApache 2.0
FrameworkPyTorch
PaperEnd-to-End Object Detection with Transformers
Performance43.5 AP on COCO

What is detr-resnet-101?

DETR-ResNet-101 is a groundbreaking object detection model that combines a ResNet-101 backbone with transformer architecture for end-to-end object detection. Developed by Facebook, it represents a departure from traditional object detection approaches by eliminating the need for hand-crafted components like non-maximum suppression or anchor generation.

Implementation Details

The model utilizes an encoder-decoder transformer architecture with a convolutional backbone. It processes images through 100 object queries, each designed to detect specific objects in the image. The model employs two specialized heads: a linear layer for class prediction and an MLP for bounding box detection.

  • ResNet-101 backbone for feature extraction
  • Transformer-based encoder-decoder architecture
  • Bipartite matching loss function
  • Hungarian algorithm for optimal query-annotation matching
  • Trained on COCO 2017 dataset (118k images)

Core Capabilities

  • High-accuracy object detection (43.5 AP on COCO)
  • End-to-end detection without post-processing
  • Support for multiple object detection in single images
  • Efficient parallel processing of queries

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its end-to-end approach to object detection using transformers, eliminating traditional hand-crafted components while achieving competitive performance. It's trained using a novel bipartite matching loss and processes images holistically rather than using region proposals.

Q: What are the recommended use cases?

The model is ideal for complex object detection tasks, particularly those involving multiple objects in varying scales and contexts. It's well-suited for applications in autonomous driving, surveillance, retail analytics, and general computer vision tasks requiring accurate object detection.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026