donut-base-finetuned-cord-v2

donut-base-finetuned-cord-v2

naver-clova-ix

OCR-free document understanding transformer model fine-tuned on CORD dataset, combining Swin Transformer vision encoder with BART text decoder for document parsing tasks.

PropertyValue
LicenseMIT
PaperOCR-free Document Understanding Transformer
Downloads15,251
TagsImage-to-Text, Vision, Transformers

What is donut-base-finetuned-cord-v2?

Donut is an innovative document understanding transformer that operates without traditional OCR (Optical Character Recognition). This particular model is fine-tuned on the CORD dataset, specifically designed for document parsing tasks. Developed by researchers at Naver Clova IX, it represents a significant advancement in document understanding technology.

Implementation Details

The model architecture combines two powerful components: a Swin Transformer serving as the vision encoder and a BART model functioning as the text decoder. The vision encoder processes input images into embedded representations, while the decoder generates text outputs in an autoregressive manner based on these encodings.

  • Vision Encoder: Swin Transformer architecture for image processing
  • Text Decoder: BART-based autoregressive text generation
  • End-to-end training without OCR dependency

Core Capabilities

  • Document parsing and understanding
  • Direct image-to-text conversion
  • Structured information extraction from documents
  • OCR-free text recognition and understanding

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its OCR-free approach to document understanding, making it more efficient and potentially more accurate than traditional OCR-based systems. It can directly process document images and generate structured text output without intermediate OCR steps.

Q: What are the recommended use cases?

The model is particularly well-suited for document parsing tasks, especially those involving structured documents like receipts, forms, and invoices. It's specifically fine-tuned on the CORD dataset, making it optimal for processing commercial documents and receipts.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026