MetricX-23-QE-Large-V2P0
Property | Value |
---|---|
License | Apache 2.0 |
Author | |
Framework | PyTorch |
Paper | MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task |
What is metricx-23-qe-large-v2p0?
MetricX-23-QE-Large-V2P0 is a reference-free (quality estimation) model designed for automatic evaluation of machine translations. Built on the mT5 architecture, it's part of Google's MetricX-23 family of models submitted to the WMT'23 Metrics Shared Task. This model specializes in evaluating translation quality without requiring reference translations, making it particularly valuable for real-world applications.
Implementation Details
The model is initialized with mT5 and fine-tuned on a combination of direct assessment and MQM (Multidimensional Quality Metrics) data. It outputs scores in the range of 0-25, where lower scores indicate better translation quality. The model supports a maximum input length of 1024 tokens and has been trained with various synthetic data to handle translation edge cases.
- Trained with robust synthetic data for handling edge cases like over/under-translation
- Supports source-to-hypothesis evaluation without reference translations
- Implements PyTorch for efficient inference
- Processes input in batches for improved performance
Core Capabilities
- Reference-free translation quality assessment
- Handles multiple language pairs effectively
- Robust evaluation of translation errors and quality issues
- Strong correlation with human judgments (system-level accuracy of 0.744 for EN-DE)
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to evaluate translation quality without reference translations, combined with its robust training on synthetic data for handling edge cases, makes it particularly valuable for real-world applications where reference translations are unavailable.
Q: What are the recommended use cases?
The model is ideal for production environments where automated translation quality assessment is needed without reference translations. It's particularly useful for large-scale translation evaluation, continuous monitoring of MT systems, and quality assurance in translation pipelines.