trocr-large-str

trocr-large-str

microsoft

TrOCR large model specialized for scene text recognition (STR), using transformer-based architecture with BEiT encoder and RoBERTa decoder. Fine-tuned on multiple OCR benchmarks.

PropertyValue
AuthorMicrosoft
Research PaperarXiv:2109.10282
Downloads1,956
TagsImage-to-Text, Transformers, Vision-encoder-decoder

What is trocr-large-str?

TrOCR-large-str is a sophisticated optical character recognition model that combines the power of transformer architecture with pre-trained vision and language models. It's specifically fine-tuned on multiple scene text recognition benchmarks including IC13, IC15, IIIT5K, and SVT, making it particularly effective for real-world text recognition tasks.

Implementation Details

The model employs a hybrid architecture consisting of an image transformer encoder initialized from BEiT weights and a text transformer decoder initialized from RoBERTa. Images are processed in 16x16 pixel patches with added positional embeddings before being passed through the transformer layers.

  • Vision encoder based on BEiT architecture
  • Text decoder leveraging RoBERTa's capabilities
  • 16x16 pixel patch processing
  • Autoregressive token generation

Core Capabilities

  • Single text-line image recognition
  • Scene text recognition
  • Document text extraction
  • Robust handling of various text styles and orientations

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its combination of pre-trained vision and language models, along with specific fine-tuning on multiple OCR benchmarks. The use of transformer architecture for both encoding and decoding makes it particularly effective at handling complex text recognition scenarios.

Q: What are the recommended use cases?

The model is specifically designed for single text-line OCR tasks. It's ideal for applications involving scene text recognition, document processing, and general OCR tasks where high accuracy is required.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026