Florence-2-base

Florence-2-base

microsoft

Microsoft's Florence-2-base is a 0.23B parameter vision foundation model supporting multiple tasks like captioning, detection, and OCR with superior zero-shot capabilities.

PropertyValue
Parameter Count0.23B
LicenseMIT
PaperarXiv:2311.06242
ArchitectureTransformers-based Vision Foundation Model

What is Florence-2-base?

Florence-2-base is a compact yet powerful vision foundation model developed by Microsoft that uses a prompt-based approach to handle various vision and vision-language tasks. Trained on the massive FLD-5B dataset containing 5.4 billion annotations across 126 million images, it represents a significant advancement in multi-task visual understanding.

Implementation Details

The model implements a sequence-to-sequence architecture optimized for both zero-shot and fine-tuned applications. It utilizes PyTorch and the Transformers library, supporting float16 precision for efficient processing.

  • Leverages prompt-based task specification
  • Processes both image and text inputs
  • Supports batch processing and GPU acceleration
  • Implements beam search for generation tasks

Core Capabilities

  • Image Captioning (with multiple detail levels)
  • Object Detection (mAP 34.7 on COCO val2017)
  • Dense Region Captioning
  • OCR with Region Detection
  • Phrase Grounding
  • Visual Question Answering

Frequently Asked Questions

Q: What makes this model unique?

Florence-2-base stands out for its ability to handle multiple vision tasks through simple prompts without task-specific fine-tuning, achieving strong zero-shot performance despite its relatively small size of 0.23B parameters.

Q: What are the recommended use cases?

The model excels in various computer vision tasks including image captioning (133.0 CIDEr on COCO), object detection, OCR, and visual grounding. It's particularly suitable for applications requiring multiple vision capabilities in a single model.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026