Apple's PICO Codec Validated Perceptual Compression
Author
Brandon Cade
Date Published
Apple just published a learned image codec, and the interesting part isn't the compression numbers. It's what they chose to optimize for. PICO, short for Perceptual Image Codec, is built around two goals held at once: perceptual quality and on-device runtime. Not pixel fidelity. Not BD-rate in isolation. Perception and speed, together, treated as the actual objective.
That design choice is the news. For years the dominant way to compare codecs has been rate-distortion: squeeze the most quality, as measured by pixel-accuracy metrics, into the fewest bits. PICO is a deliberate move away from that frame, from one of the largest companies on earth, and it lands on exactly the position this argument has been making all along. Compression is a perception problem, and latency is not a footnote.
Key Takeaways
- Apple's PICO codec is jointly optimized for perceptual quality and on-device latency, not rate-distortion alone.
- It reports 2.3 to 3x bitrate savings over AV1, AV2, VVC, ECM, and JPEG AI, validated through subjective user studies rather than PSNR.
- The takeaway isn't the numbers; it's the design philosophy. The industry's most resourced players are now optimizing for perception and speed, the same two axes that matter in production.
What did Apple actually build?
PICO is a learned image codec from Apple's research team, introduced in a May 2026 paper on practical learned image compression (Rippel et al., What Matters in Practical Learned Image Compression, 2026). What sets it apart is the optimization target: the team explicitly designed it for perceptual quality and runtime at the same time, then ran a performance-aware architecture search across millions of model configurations to find designs that hit an on-device latency budget while maximizing perceptual compression performance.
The headline results are strong. Apple reports PICO achieves 2.3 to 3x bitrate savings against AV1, AV2, VVC, ECM, and JPEG AI, and 20 to 40 percent savings against the best learned-codec alternatives (Rippel et al., 2026). On an iPhone 17 Pro Max, it encodes a 12-megapixel image in as little as 230 milliseconds and decodes it in 150, faster than most state-of-the-art learned codecs run on a datacenter V100.
But the number that matters most isn't a number at all. It's the method behind it: the savings are measured through subjective user studies, not PSNR. Apple optimized for, and validated against, what humans actually perceive.
Why is the optimization target the real story?
Because what a codec optimizes for defines what it becomes, and Apple chose perception over pixel-fidelity. The team is explicit about the reasoning. The key advantage of learned codecs over hand-engineered ones, they argue, is the ability to be optimized directly for the task at hand, which is usually to appeal to the human visual system (Rippel et al., 2026).
Read that again, because it's the whole point. A traditional codec is built to minimize a mathematical distortion. A learned codec can be built to minimize perceived distortion, the error a person actually notices, which is a different and better target. PICO is a flagship example of a codec designed around that distinction rather than around rate-distortion curves.
This is the argument we have made repeatedly: file size at a fixed pixel-error target is the wrong objective, and perceived quality is the right standard. Seeing Apple commit an entire codec, and a neural architecture search across millions of configurations, to the perceptual-quality target is about as strong a validation of that position as the field can offer.
What about the latency half?
This is the part that should not be overlooked, because Apple treated runtime as a first-class objective, not an afterthought. PICO wasn't optimized for perceptual quality and then checked for speed. The architecture search was performance-aware from the start, searching for configurations that meet an on-device latency budget while maximizing compression.
That ordering matters. Most learned-codec research optimizes the rate-distortion curve and reports latency as a caveat at the end, which is how the field ended up with beautiful BD-rate numbers attached to models that decode too slowly to ship. PICO inverts it: the latency budget is a constraint the design has to satisfy, and quality is maximized within it.
It's the same lesson that separates a benchmark winner from a production codec. A model that looks better but decodes slowly isn't shippable, and the strongest design treats speed as a requirement rather than a hope. Apple, optimizing for on-device decode on a phone, clearly reached the same conclusion: perception and latency are the two axes that decide whether a codec is real.
What does this mean for everyone else?
It means the perceptual-and-latency-first approach is no longer a contrarian position. It's where the most resourced player in consumer hardware just placed its research. When Apple builds a codec around perceived quality and on-device speed, that's a signal about where compression is heading, and a strong hint that optimizing purely for rate-distortion is optimizing for the past.
For platforms and teams moving images at scale, the implication is practical. The question stops being "which codec has the best BD-rate" and becomes "which approach delivers the quality people perceive, fast enough, on the hardware they actually use." That reframing, from a math contest to a perception-and-delivery problem, is the shift underneath all of this.
Frequently Asked Questions
What is Apple's PICO codec?
PICO, or Perceptual Image Codec, is a learned image codec introduced by Apple researchers in a May 2026 paper. It is jointly optimized for perceptual quality and on-device runtime, and reports 2.3 to 3x bitrate savings over codecs including AV1, AV2, VVC, ECM, and JPEG AI, validated through subjective user studies.
How is PICO different from JPEG AI or VVC?
The main difference is the optimization target. PICO is designed around perceptual quality and runtime together, validated against human perception rather than pixel-accuracy metrics like PSNR. It reports meaningful bitrate savings over JPEG AI, VVC, and the AV1/AV2/ECM family while running fast enough to encode and decode large images on a phone.
Does PICO mean neural compression is production-ready?
It is strong evidence that perceptual, latency-aware neural compression is becoming practical for on-device use. PICO reports encoding a 12-megapixel image in around 230 milliseconds and decoding in 150 on an iPhone, which is in a different latency class than most research codecs. Production readiness still depends on the specific workload and deployment.
Why does optimizing for perception matter?
Because human vision does not weigh every pixel equally, and a codec optimized for pixel-error can spend bits on fidelity no one perceives while missing what people actually notice. Optimizing for perceived quality targets the error that matters to a viewer, which is why it tends to produce smaller files that also look better.
The bottom line
The compression numbers in Apple's PICO paper are impressive, but they are not the headline. The headline is the design philosophy: a major codec, built and validated around perceptual quality and on-device latency rather than rate-distortion alone. That is a deliberate statement about what compression is actually for, and it matches where the problem has been heading.
File size was never the goal. Perceived quality, delivered fast enough on real hardware, is. When Apple commits a research program to that target, it stops being a point of view and starts being the direction of the field.
Inverity