Florence-2-base-ft

Maintained By
microsoft

Florence-2-base-ft

PropertyValue
Parameter Count0.23B
LicenseMIT
PaperFlorence-2 Paper
Model TypeVision-Language Model

What is Florence-2-base-ft?

Florence-2-base-ft is a finetuned version of the base Florence-2 model, designed as a versatile vision foundation model that can handle multiple vision and vision-language tasks through a prompt-based approach. This 0.23B parameter model has been finetuned on a collection of downstream tasks to provide enhanced performance across various applications.

Implementation Details

The model utilizes a sequence-to-sequence architecture and is trained using the FLD-5B dataset, which contains 5.4 billion annotations across 126 million images. It's implemented using HuggingFace's transformers library and supports both zero-shot and fine-tuned operations.

  • Trained with float16 precision
  • Supports CUDA acceleration
  • Requires minimal prompt engineering for different tasks
  • Integrates seamlessly with the HuggingFace ecosystem

Core Capabilities

  • Image Captioning (Multiple detail levels)
  • Object Detection with bounding boxes
  • Dense Region Captioning
  • OCR with and without region detection
  • Caption to Phrase Grounding
  • Region Proposal Generation

Frequently Asked Questions

Q: What makes this model unique?

Florence-2-base-ft stands out for its ability to handle multiple vision tasks through simple prompts without requiring task-specific fine-tuning. Despite its relatively small size (0.23B parameters), it achieves competitive performance across various benchmarks.

Q: What are the recommended use cases?

The model excels in scenarios requiring multi-task vision capabilities, including image captioning (CIDEr score of 140.0 on COCO Caption), object detection (41.4 mAP on COCO Det.), and visual question answering (79.7% accuracy on VQAv2). It's particularly suitable for applications needing integrated vision-language capabilities.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.