StructTable-base

Maintained By
U4R

StructTable-base

PropertyValue
Parameter Count324M
LicenseApache 2.0
PaperarXiv:2406.11633
LanguagesEnglish, Chinese
Model TypeImage-to-Text

What is StructTable-base?

StructTable-base is a specialized AI model designed to convert table images into LaTeX code, particularly focusing on scientific documents. Trained on the DocGenome benchmark, it processes complex tables with merged cells and sophisticated layouts from sources like arXiv and SciHub papers.

Implementation Details

The model implements a transformer-based architecture with 324M parameters, utilizing the pix2struct framework for image-to-text conversion. It supports both 4096 and 2048 image token configurations, with the latest version featuring enhanced inference speed through TensorRT acceleration.

  • Supports Times New Roman and Songti (宋体) fonts primarily
  • Implements TensorRT-LLM for 10x faster inference
  • Processes merged cells and complex table structures
  • Outputs in LaTeX format with HTML/Markdown conversion options

Core Capabilities

  • Table structure extraction and conversion to LaTeX
  • Multi-lingual support (English and Chinese)
  • Scientific document table processing
  • High-speed inference (≈1 second on A100 GPU)
  • Format conversion flexibility (LaTeX, HTML, Markdown)

Frequently Asked Questions

Q: What makes this model unique?

StructTable-base stands out for its specialized focus on scientific table processing and its ability to handle complex table structures while maintaining high accuracy and processing speed. The integration with TensorRT-LLM for acceleration makes it particularly suitable for production environments.

Q: What are the recommended use cases?

The model is best suited for converting scientific document tables, particularly from arXiv and SciHub papers, into LaTeX code. It's ideal for academic document processing, scientific research, and automated document conversion systems where accurate table structure preservation is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.