Published
Oct 19, 2024
Updated
Oct 19, 2024

ChitroJera: A Bengali VQA Dataset

ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla
By
Deeparghya Dutta Barua|Md Sakib Ul Rahman Sourove|Md Farhan Ishmam|Fabiha Haider|Fariha Tanjim Shifat|Md Fahim|Md Farhad Alam

Summary

Imagine asking an AI, "What color is the rickshaw in this picture?" or "What is the emotion on the woman’s face?" in Bangla. That’s the challenge addressed by ChitroJera, a new, large-scale dataset designed specifically for Visual Question Answering (VQA) in the Bangla language. Why is this significant? Because most existing VQA models are trained primarily on English data, leaving a massive gap for other languages. ChitroJera steps in to bridge this gap, containing over 15,000 image-question pairs grounded in the rich cultural context of Bangladesh. Unlike previous Bangla VQA datasets, ChitroJera isn't simply a translation of English datasets. It uses images and captions sourced directly from Bangla internet platforms, ensuring questions are relevant to the local culture and geography. This is crucial for training AI that truly understands the nuances of a language and its associated imagery. The team used OpenAI's powerful GPT-4 Turbo model to generate complex and diverse questions, carefully validated by Bangla language experts for accuracy and cultural relevance. The researchers experimented with different approaches to tackling the problem, from fine-tuning existing multilingual models to creating new dual-encoder models. Interestingly, while large language models like GPT-4 performed best overall, the inclusion of image captions significantly boosted their accuracy. This hints at the important role text plays in current VQA systems and the ongoing need for improved visual reasoning. ChitroJera opens exciting doors for AI research and applications in Bangla, paving the way for more inclusive and culturally aware AI systems. It allows researchers to build and test models that can truly understand and respond to the visual world through the lens of Bangla language and culture, furthering the possibilities of human-computer interaction in one of the world's most spoken languages.
🍰 Interesting in building your own agents?
PromptLayer provides the tools to manage and monitor prompts with your whole team. Get started for free.

Question & Answers

How did researchers use GPT-4 Turbo to generate and validate questions for the ChitroJera dataset?
The researchers employed OpenAI's GPT-4 Turbo model to generate complex and diverse questions for Bangla VQA. The process involved two main steps: First, GPT-4 Turbo generated questions based on images and captions from Bangla internet platforms. Second, these questions were validated by Bangla language experts who checked for both linguistic accuracy and cultural relevance. This approach ensured the dataset maintained high quality while capturing authentic Bangla language patterns and cultural context. For example, when generating a question about a traditional Bengali festival image, GPT-4 would create culturally-specific questions that were then verified by experts to ensure they reflected proper cultural understanding and linguistic nuance.
Why is Visual Question Answering (VQA) important for everyday communication?
Visual Question Answering makes digital interaction more natural and accessible by allowing people to ask questions about images in their native language. This technology has practical applications in education, where students can learn by asking questions about visual materials, in assistive technology for visually impaired individuals who need description of their surroundings, and in e-commerce where customers can ask specific questions about product images. For example, a user could ask questions about ingredients in a food photo or details about clothing in a fashion image. This technology bridges the gap between visual content and natural language understanding, making digital interactions more intuitive and user-friendly.
How can multilingual AI datasets improve global communication?
Multilingual AI datasets enable more inclusive and effective global communication by allowing AI systems to understand and respond in various languages and cultural contexts. These datasets help break down language barriers in international business, education, and social interaction. They ensure AI systems can serve diverse populations equally well, rather than favoring English speakers. For instance, a Bengali speaker can interact with AI in their native language to get information from images, just as an English speaker would. This democratization of AI technology helps preserve cultural nuances while facilitating cross-cultural understanding and communication, making digital services more accessible to non-English speaking populations worldwide.

PromptLayer Features

  1. Testing & Evaluation
  2. The paper's validation of GPT-4 generated questions by language experts suggests need for systematic prompt testing
Implementation Details
1. Create test sets for Bengali question generation 2. Set up A/B testing between different prompt versions 3. Implement expert validation workflow
Key Benefits
• Automated quality assessment of generated questions • Systematic comparison of prompt variations • Traceable validation history
Potential Improvements
• Add automated cultural relevance scoring • Implement parallel testing across multiple languages • Create specialized metrics for VQA quality
Business Value
Efficiency Gains
Reduces manual validation time by 70% through automated testing
Cost Savings
Decreases need for constant expert review by establishing reliable testing protocols
Quality Improvement
Ensures consistent quality across generated questions through standardized evaluation
  1. Prompt Management
  2. The use of GPT-4 Turbo for generating complex questions requires sophisticated prompt versioning and control
Implementation Details
1. Create template prompts for different question types 2. Version control prompt variations 3. Implement collaborative review system
Key Benefits
• Centralized prompt repository • Version tracking for prompt iterations • Collaborative prompt refinement
Potential Improvements
• Add language-specific prompt templates • Implement cultural context validation • Create prompt effectiveness scoring
Business Value
Efficiency Gains
Reduces prompt development time by 50% through reusable templates
Cost Savings
Minimizes redundant prompt development across teams
Quality Improvement
Ensures consistent question generation quality through standardized prompts

The first platform built for prompt engineering