StructTable-base

Property	Value
Parameter Count	324M
License	Apache 2.0
Paper	arXiv:2406.11633
Languages	English, Chinese
Model Type	Image-to-Text

What is StructTable-base?

StructTable-base is a specialized AI model designed to convert table images into LaTeX code, particularly focusing on scientific documents. Trained on the DocGenome benchmark, it processes complex tables with merged cells and sophisticated layouts from sources like arXiv and SciHub papers.

Implementation Details

The model implements a transformer-based architecture with 324M parameters, utilizing the pix2struct framework for image-to-text conversion. It supports both 4096 and 2048 image token configurations, with the latest version featuring enhanced inference speed through TensorRT acceleration.

Supports Times New Roman and Songti (宋体) fonts primarily
Implements TensorRT-LLM for 10x faster inference
Processes merged cells and complex table structures
Outputs in LaTeX format with HTML/Markdown conversion options

Core Capabilities

Table structure extraction and conversion to LaTeX
Multi-lingual support (English and Chinese)
Scientific document table processing
High-speed inference (≈1 second on A100 GPU)
Format conversion flexibility (LaTeX, HTML, Markdown)

Frequently Asked Questions

Q: What makes this model unique?

StructTable-base stands out for its specialized focus on scientific table processing and its ability to handle complex table structures while maintaining high accuracy and processing speed. The integration with TensorRT-LLM for acceleration makes it particularly suitable for production environments.

Q: What are the recommended use cases?

The model is best suited for converting scientific document tables, particularly from arXiv and SciHub papers, into LaTeX code. It's ideal for academic document processing, scientific research, and automated document conversion systems where accurate table structure preservation is crucial.