dit-doclaynet

dit-doclaynet

jzju

A specialized document layout analysis model based on DIT architecture, trained on DocLayNet for 4 epochs to segment 11 different document element types

PropertyValue
Authorjzju
Base Modelmicrosoft/dit-large
Training DataDocLayNet-v1.1
Model HubHugging Face

What is dit-doclaynet?

dit-doclaynet is a specialized document layout analysis model built on Microsoft's Document Image Transformer (DIT) architecture. The model has been specifically trained to perform semantic segmentation of document images, capable of identifying 11 distinct document element types including captions, footnotes, formulas, and more.

Implementation Details

The model leverages the BeitForSemanticSegmentation architecture and was trained for 4 epochs on the DocLayNet-v1.1 dataset. It processes input images and outputs logits of shape (batch_size, num_labels, height, width), where each label corresponds to a specific document element type.

  • Built on microsoft/dit-large architecture
  • Uses AutoImageProcessor for image preprocessing
  • Outputs 11-class semantic segmentation maps
  • Supports standard document image resolutions

Core Capabilities

  • Identifies and segments 11 document elements: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, and Title
  • Processes RGB document images
  • Generates pixel-wise segmentation masks
  • Supports batch processing of documents

Frequently Asked Questions

Q: What makes this model unique?

The model specializes in comprehensive document layout analysis, offering fine-grained segmentation of 11 different document elements, making it particularly useful for document understanding and processing tasks.

Q: What are the recommended use cases?

This model is ideal for document processing pipelines, academic paper analysis, automated document understanding systems, and any application requiring detailed document structure analysis. It's particularly useful for extracting structured information from complex document layouts.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026