dolphin-vision-72b

dolphin-vision-72b

cognitivecomputations

A powerful 73.2B parameter multimodal LLM built on Qwen2-72B, capable of advanced vision-language tasks with uncensored reasoning and strong benchmark performance.

PropertyValue
Parameter Count73.2B
Base ModelQwen/Qwen2-72B
LicenseTongyi-Qianwen
Tensor TypeBF16

What is dolphin-vision-72b?

Dolphin-Vision-72B is an advanced multimodal language model developed by Cognitive Computations, combining powerful vision-language capabilities with unrestricted reasoning abilities. Built on the Qwen2-72B architecture, it has been trained on 8 diverse datasets to enable comprehensive understanding and analysis of both textual and visual inputs.

Implementation Details

The model leverages a state-of-the-art architecture trained using the Axolotl framework, incorporating various specialized datasets including Dolphin-2.9, OpenHermes-2.5, and specialized mathematical and coding datasets. It demonstrates impressive benchmark performances, scoring 83.6 on VQA v2 and 81.2 on MMBench, competing closely with GPT-4V.

  • Multimodal processing with advanced vision-language capabilities
  • Uncensored reasoning and detailed image analysis
  • Efficient BF16 tensor format implementation
  • Comprehensive dataset training across multiple domains

Core Capabilities

  • Advanced visual question answering and analysis
  • Detailed OCR and text extraction from images
  • Mathematical reasoning and problem-solving
  • Unrestricted image interpretation and commentary
  • Complex visual-textual understanding tasks

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its uncensored reasoning capabilities and impressive benchmark performance, particularly in vision-language tasks. It's built on the powerful Qwen2-72B architecture and trained on a carefully curated set of 8 specialized datasets.

Q: What are the recommended use cases?

The model excels in visual question answering, detailed image analysis, OCR tasks, and mathematical reasoning. It's particularly suitable for applications requiring unrestricted image interpretation and complex visual-textual understanding.

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026