VideoScore
Property | Value |
---|---|
Parameter Count | 8.27B |
License | Apache 2.0 |
Paper | arXiv:2406.15252 |
Base Model | Mantis-8B-Idefics2 |
Tensor Type | BF16 |
What is VideoScore?
VideoScore is an advanced video quality evaluation model developed by TIGER-Lab that aims to simulate human judgment in assessing video quality. Built on the Mantis-8B-Idefics2 architecture and trained on the VideoFeedback dataset, it evaluates videos across five crucial dimensions: visual quality, temporal consistency, dynamic degree, text-to-video alignment, and factual consistency.
Implementation Details
The model implements a regression-based approach to video evaluation, outputting scores from 1.0 to 4.0 for each evaluation aspect. It processes up to 16 frames per video and uses a sophisticated transformer architecture to analyze both visual and textual components.
- Achieves 75+ Spearman correlation with human evaluations
- Outperforms existing MLLM-prompting methods and feature-based metrics
- Supports multiple evaluation benchmarks including VideoEval-test, EvalCrafter, GenAI-Bench, and VBench
- Processes videos using PyAV for frame extraction and analysis
Core Capabilities
- Multi-dimensional video quality assessment across five key aspects
- Automated scoring system with human-like judgment capabilities
- Efficient processing of video frames with uniform sampling
- Integration with popular machine learning frameworks
- Support for both regression and generation-based evaluation approaches
Frequently Asked Questions
Q: What makes this model unique?
VideoScore's unique strength lies in its ability to provide comprehensive video quality evaluation across multiple dimensions while maintaining high correlation with human judgment. It's particularly notable for its state-of-the-art performance across various benchmarks and its ability to process both visual and textual components.
Q: What are the recommended use cases?
The model is ideal for evaluating AI-generated videos, assessing video quality in production pipelines, and conducting research in video generation and evaluation. It's particularly useful for tasks requiring objective quality metrics that align with human perception.