deplot

deplot

google

DePlot is a 282M parameter visual reasoning model that translates plots/charts to text for LLM processing, supporting 5 languages with state-of-the-art performance in chart QA.

PropertyValue
Parameter Count282M
LicenseApache 2.0
PaperarXiv:2212.10505
Languages SupportedEnglish, French, Romanian, German, Multilingual
ArchitecturePix2struct-based Transformer

What is DePlot?

DePlot is a groundbreaking visual language model developed by Google that introduces a novel one-shot approach to visual language reasoning. The model specializes in understanding and analyzing charts and plots by decomposing the complex task into two manageable steps: plot-to-text translation and reasoning over the translated text.

Implementation Details

The model utilizes a Pix2struct architecture with 282M parameters, implemented in PyTorch with Safetensors support. It works by first converting visual plot data into a linearized table format, which can then be processed by Large Language Models (LLMs) for reasoning tasks.

  • Built on Transformer architecture with visual understanding capabilities
  • Supports F32 tensor operations
  • Implements visual question-answering pipeline
  • Provides multilingual support across 5 languages

Core Capabilities

  • One-shot visual language reasoning
  • Plot and chart comprehension
  • Automatic data table generation from visual inputs
  • 24.0% improvement over previous SOTA on human-written queries
  • Multilingual support for diverse applications

Frequently Asked Questions

Q: What makes this model unique?

DePlot's distinctive feature is its ability to perform one-shot visual language reasoning without requiring extensive training data, unlike previous models that needed tens of thousands of examples. It achieves this through its innovative two-step approach of plot-to-text translation followed by LLM reasoning.

Q: What are the recommended use cases?

The model is ideal for applications involving chart and plot analysis, automated data extraction from visualizations, and visual question-answering systems. It's particularly useful in scenarios requiring multilingual support and where traditional methods would require extensive training data.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026