Nougat-LaTeX-base

Property	Value
Parameter Count	349M
License	Apache 2.0
Model Type	Donut Vision-Encoder-Decoder
Accuracy	62.38% token accuracy

What is nougat-latex-base?

Nougat-LaTeX-base is a specialized vision-encoder-decoder model fine-tuned from facebook/nougat-base, specifically designed to convert mathematical equation images into LaTeX code. The model incorporates adaptive padding and optimized input resolution to handle equation image segments more effectively than its predecessor.

Implementation Details

The model employs a sophisticated architecture based on the Donut framework, featuring modifications to handle varying input resolutions and prevent rescaling artifacts. It achieves state-of-the-art performance with a 62.38% token accuracy and 0.0618 normalized edit distance on standard benchmarks.

Adaptive padding approach for optimal image processing
Beam search generation strategy for improved output quality
Optimized for handling mathematical equation images

Core Capabilities

High-accuracy conversion of mathematical equations to LaTeX code
Robust handling of various image resolutions
Support for complex mathematical notation and symbols
Efficient processing with beam search optimization

Frequently Asked Questions

Q: What makes this model unique?

The model's adaptive padding approach and specialized fine-tuning on the im2latex-100k dataset make it particularly effective at handling equation images, outperforming alternatives like pix2tex in both token accuracy and edit distance metrics.

Q: What are the recommended use cases?

The model is ideal for academic and technical documentation workflows, particularly when converting mathematical equations from images to LaTeX code for digital publishing, document conversion, or content management systems.