MetricX-23-QE-Large-V2P0

Property	Value
License	Apache 2.0
Author	Google
Framework	PyTorch
Paper	MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task

What is metricx-23-qe-large-v2p0?

MetricX-23-QE-Large-V2P0 is a reference-free (quality estimation) model designed for automatic evaluation of machine translations. Built on the mT5 architecture, it's part of Google's MetricX-23 family of models submitted to the WMT'23 Metrics Shared Task. This model specializes in evaluating translation quality without requiring reference translations, making it particularly valuable for real-world applications.

Implementation Details

The model is initialized with mT5 and fine-tuned on a combination of direct assessment and MQM (Multidimensional Quality Metrics) data. It outputs scores in the range of 0-25, where lower scores indicate better translation quality. The model supports a maximum input length of 1024 tokens and has been trained with various synthetic data to handle translation edge cases.

Trained with robust synthetic data for handling edge cases like over/under-translation
Supports source-to-hypothesis evaluation without reference translations
Implements PyTorch for efficient inference
Processes input in batches for improved performance

Core Capabilities

Reference-free translation quality assessment
Handles multiple language pairs effectively
Robust evaluation of translation errors and quality issues
Strong correlation with human judgments (system-level accuracy of 0.744 for EN-DE)

Frequently Asked Questions

Q: What makes this model unique?

The model's ability to evaluate translation quality without reference translations, combined with its robust training on synthetic data for handling edge cases, makes it particularly valuable for real-world applications where reference translations are unavailable.

Q: What are the recommended use cases?

The model is ideal for production environments where automated translation quality assessment is needed without reference translations. It's particularly useful for large-scale translation evaluation, continuous monitoring of MT systems, and quality assurance in translation pipelines.