TB-OCR-preview-0.1

TB-OCR-preview-0.1

yifeihu

End-to-end OCR model (4.25B params) handling text, math latex & markdown formats. Runs in 4-bit with 2.8GB VRAM. Based on Phi-3.5-vision-instruct.

PropertyValue
Parameter Count4.25B
Model TypeOCR / Image-Text-to-Text
LicenseMIT
Base ModelMicrosoft/Phi-3.5-vision-instruct
Memory Requirement2.8GB VRAM (4-bit)

What is TB-OCR-preview-0.1?

TB-OCR-preview-0.1 is an innovative end-to-end OCR model developed by Yifei Hu that uniquely handles text, mathematical LaTeX expressions, and markdown formatting simultaneously. This preview model, trained on approximately 250k image-text pairs (~50M tokens), represents a significant advancement in OCR technology by eliminating the need for separate line detection or math formula detection processes.

Implementation Details

Built on the Microsoft Phi-3.5-vision-instruct architecture, TB-OCR utilizes transformer-based technology and can be efficiently run using 4-bit quantization, requiring only 2.8GB of VRAM. The model processes input text blocks and generates clean markdown output, with special handling for headers and mathematical expressions.

  • Efficient 4-bit quantization support
  • Flash Attention 2 implementation
  • BF16 tensor type optimization
  • Markdown-formatted output generation

Core Capabilities

  • Automatic header detection and markdown formatting (##)
  • Mathematical expression handling with proper LaTeX bracketing
  • Integrated text and formula recognition
  • Efficient processing of text blocks
  • Support for batch inference and parallel processing

Frequently Asked Questions

Q: What makes this model unique?

TB-OCR stands out for its ability to handle text, math LaTeX, and markdown formats in a single pass, eliminating the need for separate detection steps. Its efficient 4-bit quantization allows for high-performance OCR with minimal VRAM requirements.

Q: What are the recommended use cases?

The model is specifically designed for processing individual text blocks rather than full pages. For full-page OCR, it's recommended to use it in conjunction with TFT-ID-1.0 for text/table/figure detection and to process larger text blocks in parallel for optimal performance.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026