Nougat-LaTeX-base
Property | Value |
---|---|
Parameter Count | 349M |
License | Apache 2.0 |
Model Type | Donut Vision-Encoder-Decoder |
Accuracy | 62.38% token accuracy |
What is nougat-latex-base?
Nougat-LaTeX-base is a specialized vision-encoder-decoder model fine-tuned from facebook/nougat-base, specifically designed to convert mathematical equation images into LaTeX code. The model incorporates adaptive padding and optimized input resolution to handle equation image segments more effectively than its predecessor.
Implementation Details
The model employs a sophisticated architecture based on the Donut framework, featuring modifications to handle varying input resolutions and prevent rescaling artifacts. It achieves state-of-the-art performance with a 62.38% token accuracy and 0.0618 normalized edit distance on standard benchmarks.
- Adaptive padding approach for optimal image processing
- Beam search generation strategy for improved output quality
- Optimized for handling mathematical equation images
Core Capabilities
- High-accuracy conversion of mathematical equations to LaTeX code
- Robust handling of various image resolutions
- Support for complex mathematical notation and symbols
- Efficient processing with beam search optimization
Frequently Asked Questions
Q: What makes this model unique?
The model's adaptive padding approach and specialized fine-tuning on the im2latex-100k dataset make it particularly effective at handling equation images, outperforming alternatives like pix2tex in both token accuracy and edit distance metrics.
Q: What are the recommended use cases?
The model is ideal for academic and technical documentation workflows, particularly when converting mathematical equations from images to LaTeX code for digital publishing, document conversion, or content management systems.