metricx-24-hybrid-xxl-v2p6-bfloat16

google

MetricX-24 XXL hybrid model for translation evaluation, supporting both reference-based and reference-free assessment with state-of-the-art performance

Property	Value
Author	Google
Model Type	Translation Evaluation
GitHub	Repository Link
Average Correlation	0.716 (Best in class)

What is metricx-24-hybrid-xxl-v2p6-bfloat16?

MetricX-24 Hybrid XXL is Google's state-of-the-art model for automatic evaluation of translations, submitted to the WMT'24 Metrics Shared Task. This bfloat16 variant offers the same capabilities as the full-precision model but with optimized memory usage. The model is unique in its ability to perform both reference-based and reference-free (quality estimation) evaluation of translations.

Implementation Details

The model is initialized with mT5 and fine-tuned on a combination of direct assessment and MQM data from WMT'15-'22. It implements automatic score clipping between 0 and 25, and includes additional synthetic training examples for handling multi-sentence segments.

Supports hybrid evaluation modes (reference-based and reference-free)
Achieves 0.865 system-level correlation for en-de translations
Implements bfloat16 precision for efficient inference
Trained on comprehensive WMT datasets

Core Capabilities

High-accuracy translation quality assessment
Multi-language support including en-de, en-es, and ja-zh pairs
Segment-level and system-level evaluation
Optimized for both short and long translations

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its hybrid capabilities, allowing both reference-based and reference-free evaluation in a single model. It achieves state-of-the-art correlation with human judgments and includes special handling for multi-sentence translations.

Q: What are the recommended use cases?

The XXL variant is recommended for applications requiring the highest agreement with human judgments of translation quality. It's particularly suitable for professional translation evaluation systems and research applications where accuracy is paramount.