Inverity
Neural Compression

When Neural Compression Fails: The Limits of Learned Codecs

Author

Brandon Cade

Date Published

Learned codecs now beat JPEG, WebP, and H.264 on the standard rate-distortion benchmarks. That much is settled. What the leaderboards don't show is how those same models behave the moment a real workload hands them something their training set never contained. That gap, between benchmark and deployment, is where neural compression actually fails, and it fails in ways a classical codec never would.

Key Takeaways

  • Neural codecs degrade sharply under distribution shift; a NeurIPS 2023 benchmark built 15 corruption types specifically to measure it (Kodak-C / CLIC-C, 2023).
  • The dangerous failures aren't visible noise, they're confident, plausible-looking reconstructions that are wrong where it matters.
  • Classical codecs fail legibly; neural codecs fail silently. That difference, not raw efficiency, should drive where you deploy each.

The point isn't that neural compression is overhyped. It's that the conditions under which it genuinely wins are narrower than its advocates tend to admit, and the cost of being wrong about those conditions is higher than with any classical format. So where does the line actually fall?

Why does neural compression fail on out-of-distribution data?

Distribution shift is the primary failure mode. A NeurIPS 2023 study had to introduce 15 separate corruption types to the Kodak and CLIC benchmarks just to measure how badly learned codecs degrade off-distribution, because no existing dataset captured it (Kodak-C / CLIC-C, 2023). The short version: models that win on clean natural photos do not hold that lead when the input drifts.

A codec trained overwhelmingly on natural photographs has learned the statistics of natural photographs. Hand it a chest radiograph, a satellite swath, or a hand-drawn schematic, and it has no prior for what it's seeing. It doesn't know it's lost. That's the problem.

The NeurIPS work traced this to a spectral bias: neural codecs allocate their learned capacity to the frequency bands that dominate their training data, and they under-serve the bands they rarely saw. When an out-of-distribution image carries its information in those neglected bands, the model quietly drops it.

In our work building a multi-stage compression pipeline, this is exactly why we don't route every asset through an end-to-end neural model. The model is one stage, gated by what the content actually is, not the default path for everything.

How is a neural codec's failure different from JPEG's?

The difference is legibility. JPEG fails the same way every time: blocking, ringing, visible mush. You can see it, predict it, and dial quality up to avoid it. A 2023 robustness benchmark found neural codecs introduce structured, content-dependent distortions instead (Kodak-C / CLIC-C, 2023). You often can't see those at all.

That invisibility is the danger. A classical codec operates on signal statistics and makes no assumptions about meaning, so when it breaks, it breaks honestly. A neural codec reconstructs from learned priors, which means under stress it produces something plausible rather than something faithful.

Think about what "plausible but wrong" means in the wrong setting. A smoothed-over lesion boundary in a radiograph. A warped tolerance on an engineering drawing. The file looks clean. The error is invisible until it matters, and by then the original is gone.

Our rule of thumb: if a human can't visually audit the reconstruction against ground truth, a silent neural error is a liability, not an efficiency gain. That single test rules out more deployment targets than any rate-distortion number.

This is why we treat failure legibility, not compression ratio, as the first question when deciding whether a learned codec belongs in a given pipeline. Efficiency you can measure on a benchmark. Trust you can't.

What does neural compression cost to actually run?

Deployment cost is the second gate, and it's steep. Learned codecs need specialized hardware to hit usable speeds, and the autoregressive context models that top the quality charts decode sequentially, which can run two orders of magnitude slower than the parallel-decoding alternatives (LVQAC, arXiv 2304.12319, 2023). For real-time or low-power targets, that math often doesn't close.

Then there's fragility across versions. Classical formats have stable, widely implemented specifications. A JPEG from 2004 still decodes everywhere. Neural codecs tie the bitstream to a specific encoder-decoder pair, and a version mismatch can corrupt reconstruction.

For anyone storing data across decades, or shipping to endpoints they don't control, that dependency is an archival risk classical formats simply don't carry.

When should you actually use a neural codec?

Use one when the content matches the training distribution, the reconstruction is human-auditable, and you control both ends of the pipeline. A 2023 benchmark confirmed neural codecs hold their rate-distortion lead on clean, in-distribution natural images (Kodak-C / CLIC-C, 2023), which is a real and common case, just not a universal one.

E-commerce product photography fits. Marketing imagery fits. These are natural-image domains, served through pipelines you own, where a human reviews the output anyway. That's the sweet spot.

Archival of irreplaceable records doesn't fit. Medical, scientific, and engineering imagery, where a silent error is unacceptable, doesn't fit. Heterogeneous delivery to clients running who-knows-what doesn't fit. Knowing which situation you're in matters more than any single benchmark score. For how we make that routing decision per asset, see How Inverity Works.


Frequently Asked Questions

Is neural compression better than JPEG?

On clean, in-distribution natural images, yes, learned codecs beat JPEG on rate-distortion, a result confirmed across the Kodak and CLIC benchmarks (Kodak-C / CLIC-C, 2023). Off-distribution, the advantage can vanish or invert, which is why "better" depends entirely on the workload.

Why do neural codecs hallucinate detail?

Because they reconstruct from learned priors rather than from signal statistics alone. A 2023 NeurIPS benchmark tied this to spectral bias: the model fills neglected frequency bands with what's statistically likely (Kodak-C / CLIC-C, 2023), producing plausible but unfaithful detail under distribution shift.

Are neural codecs safe for medical or scientific images?

Generally no, not without human audit. The failure mode is silent, structured distortion, and a 2023 robustness study showed these errors are often visually undetectable (Kodak-C / CLIC-C, 2023). For imagery where a smoothed boundary changes meaning, that risk usually outweighs the efficiency gain.

Do neural codecs have archival risk?

Yes. Reconstruction depends on a specific encoder-decoder version, and mismatches can corrupt output, unlike classical formats with stable specifications. For multi-decade storage or delivery to endpoints you don't control, that version dependency is a real operational liability.

The bottom line

Neural compression solves one problem extremely well and a broader set of problems poorly. It wins on natural images, in pipelines you control, where a human can audit the result. It fails, often silently, the moment any of those conditions break.

The leaderboards measure efficiency. Production demands legibility of failure, and that's the axis that should decide where a learned codec earns its place. At Inverity, that judgment is built into the pipeline: we route each asset to the stage that fits it, rather than forcing everything through one neural model. If your compression strategy assumes neural is always the answer, it's optimizing for the benchmark, not the workload.