StructTable-base
Property | Value |
---|---|
Parameter Count | 324M |
License | Apache 2.0 |
Paper | arXiv:2406.11633 |
Languages | English, Chinese |
Model Type | Image-to-Text |
What is StructTable-base?
StructTable-base is a specialized AI model designed to convert table images into LaTeX code, particularly focusing on scientific documents. Trained on the DocGenome benchmark, it processes complex tables with merged cells and sophisticated layouts from sources like arXiv and SciHub papers.
Implementation Details
The model implements a transformer-based architecture with 324M parameters, utilizing the pix2struct framework for image-to-text conversion. It supports both 4096 and 2048 image token configurations, with the latest version featuring enhanced inference speed through TensorRT acceleration.
- Supports Times New Roman and Songti (宋体) fonts primarily
- Implements TensorRT-LLM for 10x faster inference
- Processes merged cells and complex table structures
- Outputs in LaTeX format with HTML/Markdown conversion options
Core Capabilities
- Table structure extraction and conversion to LaTeX
- Multi-lingual support (English and Chinese)
- Scientific document table processing
- High-speed inference (≈1 second on A100 GPU)
- Format conversion flexibility (LaTeX, HTML, Markdown)
Frequently Asked Questions
Q: What makes this model unique?
StructTable-base stands out for its specialized focus on scientific table processing and its ability to handle complex table structures while maintaining high accuracy and processing speed. The integration with TensorRT-LLM for acceleration makes it particularly suitable for production environments.
Q: What are the recommended use cases?
The model is best suited for converting scientific document tables, particularly from arXiv and SciHub papers, into LaTeX code. It's ideal for academic document processing, scientific research, and automated document conversion systems where accurate table structure preservation is crucial.