donut-base-finetuned-invoices

Maintained By
to-be

Donut-base-finetuned-invoices

PropertyValue
Licensecc-by-nc-sa-4.0
Research PaperOCR-free Document Understanding Transformer
Input Resolution1280x1920 pixels
Training Duration4 hours (20k steps)

What is donut-base-finetuned-invoices?

This model is a specialized version of the Donut architecture, fine-tuned specifically for processing and understanding invoices across multiple languages. It combines a Swin Transformer vision encoder with a BART text decoder to extract key information from invoice documents without traditional OCR methods.

Implementation Details

The model was trained on a proprietary dataset of thousands of annotated invoices and non-invoices using an NVIDIA RTX A4000 GPU. It processes single-page documents at a resolution of 1280x1920 pixels, optimized for 150 DPI or lower.

  • Trained for 20,000 steps with a final validation metric of 0.034
  • Supports extraction of key fields including DocType, Currency, DocumentDate, GrossAmount, InvoiceNumber, NetAmount, TaxAmount, OrderNumber, and CreditorCountry
  • Implements a vision-encoder-decoder architecture for end-to-end document understanding

Core Capabilities

  • Multilingual invoice processing
  • OCR-free document understanding
  • Automatic field extraction and classification
  • Document type identification (Invoice vs Other)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to process invoices without traditional OCR, using a transformer-based architecture that can handle multiple languages and various invoice formats in a single pass.

Q: What are the recommended use cases?

The model is ideal for automated invoice processing systems, financial document analysis, and research applications requiring multilingual invoice understanding. It's particularly useful for organizations dealing with international invoices and requiring automated data extraction.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.