The ability to estimate the perceptual error between images is an important problem in computer vision with many applications. Although it has been studied extensively, however, no method currently exists that can robustly predict visual differences like humans. Some previous approaches used hand-coded models, but they fail to model the complexity of the human visual system. Others used machine learning to train models on human-labeled datasets, but creating large, high-quality datasets is difficult because people are unable to assign consistent error labels to distorted images. In this paper, we present PieAPP, a metric that predicts perceptual error of a distorted image with respect to a reference in a manner consistent with human observers.
Since it is much easier for people to compare two given images and identify the one more similar to a reference than to assign quality scores to each, we propose a new, large-scale dataset labeled with the probability that humans will prefer one image over another. We then train a deep-learning model using a novel, pairwise-learning framework to predict the preference of one distorted image over the other. Our key observation is that our trained network can then be used separately with only one distorted image and a reference to predict its perceptual error, without ever being trained on explicit human perceptual-error labels. The perceptual error estimated by PieAPP is well-correlated with human opinion. Furthermore, it significantly outperforms existing algorithms, while also generalizing to new kinds of distortions, unlike previous learning-based methods.
Paper and Additional Resources to try PieAPPv0.1
Technical details about PieAPPv0.1:
Try out PieAPPv0.1:
Try out the PieAPP dataset:
We evaluate the performance of popular and state-of-the-art approaches for image error or quality prediction on our proposed test set, using Kendall, Pearson, and Spearman correlation coefficients (KRCC, PLCC, and SRCC respectively in the table above). This test set is completely disjoint from our proposed training set in terms of reference images and distortion types (total 40 test reference images and 31 test distortion types). This enables us to evaluate the generalizability of the learning-based approaches to unseen distortion types and images. More details can be found in our paper and supplementary document.
This project was supported in part by NSF grants IIS-1321168 and IIS-1619376, as well as a Fall 2017 AI Grant (awarded to Ekta Prashnani).