TB-OCR-preview-0.1

Property	Value
Parameter Count	4.25B
Model Type	OCR / Image-Text-to-Text
License	MIT
Base Model	Microsoft/Phi-3.5-vision-instruct
Memory Requirement	2.8GB VRAM (4-bit)

What is TB-OCR-preview-0.1?

TB-OCR-preview-0.1 is an innovative end-to-end OCR model developed by Yifei Hu that uniquely handles text, mathematical LaTeX expressions, and markdown formatting simultaneously. This preview model, trained on approximately 250k image-text pairs (~50M tokens), represents a significant advancement in OCR technology by eliminating the need for separate line detection or math formula detection processes.

Implementation Details

Built on the Microsoft Phi-3.5-vision-instruct architecture, TB-OCR utilizes transformer-based technology and can be efficiently run using 4-bit quantization, requiring only 2.8GB of VRAM. The model processes input text blocks and generates clean markdown output, with special handling for headers and mathematical expressions.

Efficient 4-bit quantization support
Flash Attention 2 implementation
BF16 tensor type optimization
Markdown-formatted output generation

Core Capabilities

Automatic header detection and markdown formatting (##)
Mathematical expression handling with proper LaTeX bracketing
Integrated text and formula recognition
Efficient processing of text blocks
Support for batch inference and parallel processing

Frequently Asked Questions

Q: What makes this model unique?

TB-OCR stands out for its ability to handle text, math LaTeX, and markdown formats in a single pass, eliminating the need for separate detection steps. Its efficient 4-bit quantization allows for high-performance OCR with minimal VRAM requirements.

Q: What are the recommended use cases?

The model is specifically designed for processing individual text blocks rather than full pages. For full-page OCR, it's recommended to use it in conjunction with TFT-ID-1.0 for text/table/figure detection and to process larger text blocks in parallel for optimal performance.

TB-OCR-preview-0.1

TB-OCR-preview-0.1

What is TB-OCR-preview-0.1?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models