TB-OCR-preview-0.1

Maintained By
yifeihu

TB-OCR-preview-0.1

PropertyValue
Parameter Count4.25B
Model TypeOCR / Image-Text-to-Text
LicenseMIT
Base ModelMicrosoft/Phi-3.5-vision-instruct
Memory Requirement2.8GB VRAM (4-bit)

What is TB-OCR-preview-0.1?

TB-OCR-preview-0.1 is an innovative end-to-end OCR model developed by Yifei Hu that uniquely handles text, mathematical LaTeX expressions, and markdown formatting simultaneously. This preview model, trained on approximately 250k image-text pairs (~50M tokens), represents a significant advancement in OCR technology by eliminating the need for separate line detection or math formula detection processes.

Implementation Details

Built on the Microsoft Phi-3.5-vision-instruct architecture, TB-OCR utilizes transformer-based technology and can be efficiently run using 4-bit quantization, requiring only 2.8GB of VRAM. The model processes input text blocks and generates clean markdown output, with special handling for headers and mathematical expressions.

  • Efficient 4-bit quantization support
  • Flash Attention 2 implementation
  • BF16 tensor type optimization
  • Markdown-formatted output generation

Core Capabilities

  • Automatic header detection and markdown formatting (##)
  • Mathematical expression handling with proper LaTeX bracketing
  • Integrated text and formula recognition
  • Efficient processing of text blocks
  • Support for batch inference and parallel processing

Frequently Asked Questions

Q: What makes this model unique?

TB-OCR stands out for its ability to handle text, math LaTeX, and markdown formats in a single pass, eliminating the need for separate detection steps. Its efficient 4-bit quantization allows for high-performance OCR with minimal VRAM requirements.

Q: What are the recommended use cases?

The model is specifically designed for processing individual text blocks rather than full pages. For full-page OCR, it's recommended to use it in conjunction with TFT-ID-1.0 for text/table/figure detection and to process larger text blocks in parallel for optimal performance.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.