deformable-detr-DocLayNet

Aryn

A specialized object detection model using Deformable DETR architecture, trained on DocLayNet dataset with 41.1M parameters. Achieves 57.1 mAP for document layout analysis.

Property	Value
Parameter Count	41.1M
License	Apache 2.0
Performance	57.1 box mAP
Paper	Deformable DETR Paper

What is deformable-detr-DocLayNet?

Deformable-detr-DocLayNet is a specialized object detection model designed for document layout analysis. It implements the Deformable DETR (DEtection TRansformer) architecture and has been trained on the comprehensive DocLayNet dataset, which includes 80,000 annotated pages across 11 classes.

Implementation Details

The model utilizes an encoder-decoder transformer architecture with a convolutional backbone. It features two specialized heads: a linear layer for class labels and an MLP for bounding box prediction. The model employs object queries to detect document elements, using bipartite matching loss and Hungarian matching algorithm for optimization.

Transformer-based architecture with deformable attention
Trained on DocLayNet dataset with 80k annotated pages
Uses F32 tensor type for computations
Implements bipartite matching loss for training

Core Capabilities

Document layout analysis and segmentation
Multiple object detection in document images
Bounding box prediction with high accuracy
Support for 11 different document element classes

Frequently Asked Questions

Q: What makes this model unique?

This model combines the power of Deformable DETR architecture with specialized training on document layouts, making it particularly effective for document analysis tasks. Its deformable attention mechanism allows it to better handle varying document layouts and element sizes.

Q: What are the recommended use cases?

The model is ideal for document processing applications, including: automated document parsing, layout analysis, content extraction, and document structure understanding. It's particularly useful for processing complex documents with multiple elements like tables, text blocks, and figures.