swin-gportuguese-2

Maintained By
laicsiifes

Swin-GPorTuguese-2

PropertyValue
Parameter Count240M
Model TypeVision Encoder Decoder
Primary LanguageBrazilian Portuguese
Training DatasetFlickr30K Portuguese
Base ModelsSwin Transformer + GPT2-small-portuguese

What is swin-gportuguese-2?

Swin-GPorTuguese-2 is a specialized vision-language model designed for generating image captions in Brazilian Portuguese. It combines a Swin Transformer visual encoder pre-trained on ImageNet-1k with a GPT-2 Portuguese language decoder, creating a powerful system for understanding images and producing natural language descriptions.

Implementation Details

The model architecture leverages a Swin Transformer base with patch size 4 and window size 7 for image encoding at 224x224 resolution. The decoder utilizes pierreguillou's GPT2-small-portuguese model, supporting sequences up to 1024 tokens. The model achieves impressive performance metrics, including a CIDEr-D score of 64.71 and BLEU@4 of 23.15.

  • Pre-trained on ImageNet-1k for visual understanding
  • Fine-tuned on translated Flickr30K Portuguese dataset
  • Supports 224x224 image resolution
  • Implements vision encoder-decoder architecture

Core Capabilities

  • Generate natural Brazilian Portuguese captions for images
  • Process images at 224x224 resolution
  • Achieve competitive performance metrics (METEOR: 44.36, ROUGE-L: 39.39)
  • Support for batch processing and inference

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for Brazilian Portuguese image captioning, combining state-of-the-art vision and language models. It's one of the few models specifically trained for Portuguese image captioning, showing competitive performance against similar architectures.

Q: What are the recommended use cases?

The model is ideal for applications requiring Portuguese image descriptions, such as accessibility tools, content management systems, and educational resources. It's particularly suited for scenarios requiring accurate and naturalistic Portuguese captions for images.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.