deformable-detr-DocLayNet

deformable-detr-DocLayNet

Aryn

A specialized object detection model using Deformable DETR architecture, trained on DocLayNet dataset with 41.1M parameters. Achieves 57.1 mAP for document layout analysis.

PropertyValue
Parameter Count41.1M
LicenseApache 2.0
Performance57.1 box mAP
PaperDeformable DETR Paper

What is deformable-detr-DocLayNet?

Deformable-detr-DocLayNet is a specialized object detection model designed for document layout analysis. It implements the Deformable DETR (DEtection TRansformer) architecture and has been trained on the comprehensive DocLayNet dataset, which includes 80,000 annotated pages across 11 classes.

Implementation Details

The model utilizes an encoder-decoder transformer architecture with a convolutional backbone. It features two specialized heads: a linear layer for class labels and an MLP for bounding box prediction. The model employs object queries to detect document elements, using bipartite matching loss and Hungarian matching algorithm for optimization.

  • Transformer-based architecture with deformable attention
  • Trained on DocLayNet dataset with 80k annotated pages
  • Uses F32 tensor type for computations
  • Implements bipartite matching loss for training

Core Capabilities

  • Document layout analysis and segmentation
  • Multiple object detection in document images
  • Bounding box prediction with high accuracy
  • Support for 11 different document element classes

Frequently Asked Questions

Q: What makes this model unique?

This model combines the power of Deformable DETR architecture with specialized training on document layouts, making it particularly effective for document analysis tasks. Its deformable attention mechanism allows it to better handle varying document layouts and element sizes.

Q: What are the recommended use cases?

The model is ideal for document processing applications, including: automated document parsing, layout analysis, content extraction, and document structure understanding. It's particularly useful for processing complex documents with multiple elements like tables, text blocks, and figures.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026