How AI Understands Visual Quality
Date Published
Image quality assessment has evolved through three distinct eras. Each era solved the problems of the previous one. Each era created new problems that drove the next.
Era One: Mathematical Convenience
From the 1970s through the 1990s, the standard was PSNR and its sibling MSE (mean squared error). These metrics calculate pixel-level differences. They are fast. They are deterministic. They are easy to implement in hardware.
They are also wrong about what humans see. A uniform blur and a localized artifact can produce the same PSNR. One is acceptable. The other ruins an image. PSNR cannot tell them apart.
Era Two: Structural and Information-Theoretic Improvements
The 2000s brought SSIM, MS-SSIM, and VIF (Visual Information Fidelity). These metrics moved beyond pixel differences to measure structural relationships and information content. SSIM correlates better with human judgment than PSNR. It is still a hand-crafted formula. It still assumes that all image regions matter equally.
Era Three: Machine Learning
The 2010s introduced VMAF and NIQE. These are machine-learned metrics trained on human judgments. They represent a fundamental shift: instead of engineers writing formulas for quality, engineers trained models on what humans actually think.
The training process is straightforward in concept and difficult in execution. Researchers show human subjects thousands of image pairs. Each subject rates which image looks better. The dataset grows to hundreds of thousands of judgments. A model learns to predict the human rating from the image pixels.
Key datasets include TID2013, LIVE, and KADID-10k. These contain images with specific distortions—blur, noise, compression artifacts, color shifts—rated by human panels.
Modern AI Perception Models
Current models use deep learning architectures. Convolutional neural networks process image features at multiple scales. Transformers with attention mechanisms identify which regions a human would examine. The models learn semantic content: they know a face is different from a wall, a product edge is different from a shadow.
This creates two capabilities that traditional metrics lack.
First, handling complex distortions. A modern image may suffer from blocking, ringing, blur, and noise simultaneously. PSNR and SSIM struggle with compound distortions. AI models assess the combined perceptual impact.
Second, content-aware scoring. A product photo, a landscape, and a UI screenshot have different quality requirements. AI models adjust their scoring based on what the image contains and what it is for.
The Alignment Problem
The challenge is correlation. A model must predict human mean opinion scores (MOS) accurately across image types, distortion types, and viewing conditions. This requires diverse training data, careful validation, and continuous refinement against real-world performance.
The Inverity Application
At Inverity, we integrate perceptual models into optimization pipelines. We operate two tiers: near-lossless and lossless. Each tier has a different perceptual threshold. Near-lossless targets below JND for standard web use. Lossless preserves every bit for archival and print.
We validate these scores against engagement metrics. Do product images with higher perceptual scores convert better? Do hero banners with preserved edge detail drive more clicks? The feedback loop refines the model.
The next article defines what "optimized" means when you have accurate perceptual measurement: the equilibrium between quality, bandwidth, and user experience.
Inverity