Inverity
Technology

Why File Size Is the Wrong Metric

Author

Brandon Cade

Date Published

Why File Size Is the Wrong Metric

Put two compressed images side by side. They share the same PSNR, the number most of the industry uses to compare image quality. One looks crisp. The other is a smear. The metric says they're equal. Your eyes say otherwise, and your eyes are right.

That gap is not a rounding error. It's the whole problem. Most compression gets compared on file size at a target PSNR, and that target measures something close to, but importantly different from, what anyone actually cares about. Optimize the wrong number and you ship the wrong file: bigger than it needs to be, or worse-looking than a competitor's at the same size.

Key Takeaways

  • PSNR measures pixel-level error, which correlates poorly with human perception. Two images with identical PSNR can look dramatically different.
  • The objective that matters is perceived quality per byte, not bytes at a fixed pixel-error target.
  • File size is a cost to minimize under a quality floor, not a stand-in for quality itself.

So what is PSNR really measuring, and why does it drift so far from the thing you're trying to judge?

What does PSNR actually measure?

PSNR measures pixel-level error: the mathematical difference between the original and the compressed version, expressed on a logarithmic scale. It's the de facto standard for codec comparison, and it earned that status honestly. It's simple, fast, and easy to compute. But pixel difference is not perception, and that distinction is where the trouble starts (HLIC, arXiv 2109.14863, 2021).

The mechanism is the giveaway. PSNR weights every pixel equally and sums the error across the entire frame. A bright, structurally important edge and a flat patch of sky contribute to the score the same way, pixel for pixel. The human visual system does nothing of the sort.

That's the gap in one sentence. PSNR treats an image as a grid of numbers. A person treats it as a scene. The first is easy to measure. The second is what you're actually selling.

Why doesn't PSNR match what you see?

Because human vision is non-linear and spatially selective, and PSNR is neither. The eye doesn't average error evenly across a frame. It concentrates on edges, faces, motion, and contrast, and it's remarkably forgiving in regions it treats as background. PSNR has no concept of any of this.

The consequence is easy to demonstrate. Two images can share an average PSNR of around 19 while one stays clearly detailed and the other dissolves into blur, the metric averages the error across the whole frame and hides exactly where perception would have flinched (TestDevLab, Full-Reference Quality Metrics, 2024). Same number. Different image.

There's a deeper reason underneath the averaging problem. Human vision carries non-linearities that simple pixel-and-colorspace math doesn't capture, a point made years ago in the research behind perceptual encoders, where effects in the visual system simply don't survive a linear transform like RGB to YUV (Guetzli, arXiv 1703.04421, 2017). The upshot is that a metric built on linear pixel error is structurally unable to model how people actually see.


Isn't SSIM the fix?

SSIM is a real improvement, just not the finish line. Instead of raw pixel error, it compares luminance, contrast, and structure, which lines up better with how people judge similarity, and it tends to correlate more closely with subjective quality than PSNR does (Visionular, Making Sense of PSNR, SSIM, VMAF, 2024). For many comparisons, it's the better tool.

But "better" isn't "sufficient." SSIM still misses content-dependent and compression-specific artifacts, the kind that show up in some images and not others, and its accuracy varies with the complexity of the content. No single structural formula captures everything the eye notices.

That gap is exactly why purpose-built perceptual metrics exist. SSIMULACRA2 and Butteraugli were designed specifically to score the artifacts compression introduces, in a way that tracks human opinion. Learned metrics like Netflix's VMAF go further, fusing several elementary measures with machine learning because no single one generalized across all content. And even these have failure cases, Butteraugli, for instance, has been critiqued as unstable in certain quality regimes. The honest takeaway: no metric is ground truth. They're progressively better approximations of a target that ultimately lives in human perception.

What should you optimize instead?

Perceived quality per byte. Not bytes at a fixed pixel-error target, but the smallest file that preserves the quality people actually notice. That means measuring with perceptual metrics, validating against human judgment where it counts, and then asking how few bytes hold the line.

This reframe has a consequence that sounds counterintuitive until you sit with it. A codec optimized for perception can produce a smaller file that also looks better. There's no contradiction, because the two goals were never really opposed. PSNR-chasing spends bits buying pixel accuracy in regions no one perceives, fidelity you pay for in file size and the viewer never benefits from. Stop spending on the invisible, and you free up budget for what matters.

The wasted-bits inversion, optimizing PSNR spends bytes on imperceptible fidelity, connects the metric question directly to cost, and is under-discussed relative to the usual "PSNR over-penalizes" critique.

That's the link between this and the cost conversation. Bits you spend on imperceptible accuracy are bits you pay to store and ship, every time the asset is served. Measuring quality the way humans see it isn't just an accuracy improvement. It's a cost argument too.

Why does this reframe matter for compression?

Because the metric you choose to optimize quietly defines the product you build. Optimize for pixel fidelity and you build a codec that wins benchmarks and can still lose to the eye. Optimize for perception and you're building something different from the ground up, a system whose entire job is to preserve what people notice and discard what they don't.

That's not a tuning decision. It's a design philosophy, and it shapes every tradeoff downstream. "Better" stops meaning "closer to the original pixel values" and starts meaning "indistinguishable to a human at a smaller size."

It's the same perception-first logic that runs through how we think about measuring quality across the whole pipeline. For the broader argument, see [INTERNAL-LINK: perceptual decisioning → Pillar 4 hub], and for where this meets real delivery cost, [INTERNAL-LINK: image weight and Core Web Vitals → Pillar 3 cost post].

So is file size useless?

Not at all. File size matters enormously, it's just not the quality. It's the cost. The mistake is using a cost as a proxy for quality, or worse, optimizing a quality metric like PSNR as if minimizing it at a target size were the goal.

Put file size back where it belongs: as the thing you minimize, subject to a perceptual-quality floor. The right question isn't "what's the smallest file at this PSNR." It's "what's the smallest file that still looks right to a person." Same word, file size, completely different role. One is a target that misleads. The other is a constraint that clarifies.


Frequently Asked Questions

Why doesn't PSNR match what I see?

PSNR measures average pixel error on a logarithmic scale, but human vision is non-linear and spatially selective. The eye concentrates on edges, faces, and contrast while forgiving background regions, so two images with the same PSNR can look very different (TestDevLab, 2024).

Is a smaller file always lower quality?

No. A codec optimized for perception can deliver a smaller file that looks as good or better, because it stops spending bits on fidelity humans don't perceive. Smaller and better-looking are not mutually exclusive once you optimize for the right target.

What's the best image quality metric?

There isn't a single best one. Perceptual metrics like SSIMULACRA2, Butteraugli, and VMAF correlate better with human perception than PSNR, but each has failure cases. Human judgment remains the ultimate reference; the metrics are approximations of it.

Should I stop using PSNR entirely?

Not entirely. It's a useful sanity check and a long-standing standard for comparison. Just don't treat it as a stand-in for perceived quality, and don't optimize file size against it as though the number were the goal.

The bottom line

File size is the wrong metric for quality because it was never a quality metric to begin with, and PSNR, the number usually paired with it, measures pixels rather than perception. Two images can post the same score and look nothing alike. The fix isn't a better single number; it's a better objective: perceived quality per byte, measured with perceptual tools and grounded in human judgment.

Get the objective right and the rest follows. Compression stops being a contest to minimize distortion and becomes what it always was underneath, a perception problem. That's the lens we build through at Inverity, and it starts with refusing to optimize the wrong number. For the full picture, start with [INTERNAL-LINK: perceptual decisioning → Pillar 4 hub].