Inverity

PSNR Doesn't Measure Image Quality

Date Published

Peak Signal-to-Noise Ratio has been a fixture of image compression research for decades, appearing in virtually every paper that proposes a new codec, denoising algorithm, or super-resolution method.

The appeal is obvious: it is simple to compute, easy to compare across papers, and produces a single number that lets researchers rank their systems on a leaderboard.

The problem is that it measures something quite different from what human beings actually perceive as image quality, and the field has known this for a long time while continuing to use PSNR anyway, largely out of habit and convenience. The metric works by computing the mean squared error between a reference image and a processed one, then converting that into a logarithmic scale. This means every pixel contributes equally to the final score regardless of where it sits in the image, what surrounds it, or how the human visual system would weight it.

A blur that smears an edge across five pixels and a compression artifact that introduces a sharp false contour of identical magnitude will produce the same PSNR penalty, even though one might be nearly invisible to a viewer while the other is immediately distracting.

Texture, sharpness, and structural coherence, the things people actually notice and care about, are not meaningfully captured. A half-decibel improvement in PSNR can correspond to a visible degradation in perceived quality, and vice versa.

Metrics like SSIM and later LPIPS were developed precisely to address this gap, and perceptual studies have confirmed repeatedly that they correlate far better with human judgments. None of this makes PSNR useless in every context. It remains a reasonable sanity check and a useful lower bound on fidelity, and for certain applications like medical imaging where numerical accuracy genuinely matters, pixel-level error has direct meaning. But using it as the primary or sole measure of image quality in a consumer or artistic context is a category error dressed up in scientific clothing.

The persistence of PSNR on leaderboards reflects the pressure to produce comparable numbers more than it reflects any belief that the metric is the right one, and that gap between what we measure and what we care about quietly corrupts research priorities.

Methods optimized for PSNR learn to minimize average error in ways that can actively damage the perceptual qualities a human observer would value, and papers that win on the table sometimes lose badly in practice.