cogagent-chat-hf

cogagent-chat-hf

THUDM

CogAgent-18B is a powerful visual language model with 18.3B parameters, specializing in GUI operations, visual dialogue, and high-resolution image processing up to 1120x1120.

PropertyValue
Parameter Count18.3B (11B visual + 7B language)
LicenseApache-2.0
PaperLink to Paper
Supported FormatsF32, BF16

What is cogagent-chat-hf?

CogAgent-chat-hf is an advanced visual language model built upon CogVLM, specifically designed for GUI operations, visual multi-turn dialogue, and visual grounding tasks. This 18.3B parameter model represents a significant advancement in visual-language AI, capable of processing ultra-high-resolution images up to 1120x1120 pixels.

Implementation Details

The model architecture combines 11 billion visual parameters with 7 billion language parameters, creating a powerful system for image understanding and interaction. It utilizes transformers architecture and supports both F32 and BF16 tensor types for flexible deployment options.

  • Supports high-resolution image inputs (1120x1120)
  • Implements advanced visual grounding capabilities
  • Features multi-turn dialogue support
  • Includes specialized GUI operation capabilities

Core Capabilities

  • State-of-the-art performance on 9 cross-modal benchmarks including VQAv2, MM-Vet, and POPE
  • Advanced GUI operation abilities, particularly excelling in AITW and Mind2Web datasets
  • Enhanced OCR-related task handling
  • Sophisticated visual dialogue and interaction capabilities

Frequently Asked Questions

Q: What makes this model unique?

CogAgent-chat-hf stands out for its exceptional GUI agent capabilities and visual grounding functions, making it particularly suitable for applications requiring interaction with graphical interfaces and multi-turn visual dialogues.

Q: What are the recommended use cases?

The model is ideal for GUI automation tasks, visual question-answering applications, and scenarios requiring detailed image understanding and interaction. It's particularly strong in handling web pages, PC apps, and mobile applications interfaces.

Related Models

Socials
PromptLayer
Company
All services online
Location IconPromptLayer is located in the heart of New York City
PromptLayer © 2026