Hacker News

383 points by vinhnx 5 days ago | 108 comments

The DCT is a cool primitive. By extracting the low frequency coefficients, you can get a compact blurry representation of an image. This is used by preload thumbnail algorithms like blurhash and thumbhash. It's also used by some image watermarking techniques to target changes to a detail level that will be less affected by scaling or re-encoding.

I made a notebook a few years back which lets you play with / filter the DCT coefficients of an image: https://observablehq.com/d/167d8f3368a6d602

dipflow 12 hours ago

Whenever I see those 'blocky' artifacts on a low-quality image, I used to just think of it as 'bad tech.' After reading this, it's cool to realize you're actually seeing the 8x8 DCT grid itself. You're literally seeing the math break down because there wasn't enough bit-budget to describe those high-frequency sine waves. It’s like looking at the brushstrokes on a digital painting.

netsharc 12 hours ago

The default implementation of the decoding adds the artifacts.

This tool uses more clever math to replace what's missing: https://github.com/victorvde/jpeg2png

iggldiggl 3 hours ago

Also if your software for whatever reasons is using the original libjpeg in its modern (post classic version 6b) incarnation [1], right from version 7 onwards the new (and still current) maintainer switched the algorithm for chroma up-/downsampling from classic pixel interpolation to DCT-based scaling, claiming it's mathematically more beautiful and (apart from the unavoidable information loss on the first downscaling) perfectly reversible [2].

The problem with that approach however is that DCT-scaling is block-based, so for classic 4:2:0 subsampling, each 16x16 chroma block in the original image is now individually being downscaled to 8x8, and perhaps more importantly, later-on individually being upscaled back to 16x16 on decompression.

Compared to classic image resizing algorithms (bilinear scaling or whatever), this block-based upscaling can and does introduce additional visual artefacts at the block boundaries, which, while somewhat subtle, are still large enough to be actually borderline visible even when not quite pixel-peeping. ([3] notes that the visual differences between libjpeg 6b/turbo and libjpeg 7-9 on image decompression are indeed of a borderline visible magnitude.)

I stumbled across this detail after having finally upgraded my image editing software [4] from the old freebie version I'd been using for years (it was included with a computer magazine at some point) to its current incarnation, which came with a libjpeg version upgrade under the hood. Not long afterwards I noticed that for quite a few images, the new version introduced some additional blockiness when decoding JPEG images (also subsequently exacerbated by some particular post-processing steps I was doing on those images), and then I somehow stumbled across this article [3] which noted the change in chroma subsampling and provided the crucial clue to this riddle.

Thankfully, the developers of that image editor were (still are) very friendly and responsive and actually agreed to switch out the jpeg library to libjpeg-turbo, thereby resolving that issue. Likewise, luckily few other programs and operating systems seem to actually use modern libjpeg, usually preferring libjpeg-turbo or something else that continues using regular image scaling algorithms for chroma subsampling.

[1] Instead of libjpeg-turbo or whatever else is around these days.

[2] Which might be true in theory, but I tried de- and recompressing images in a loop with both libjpeg 6b and 9e, and didn't find a significant difference in the number of iterations required until the image converged to a stable compression result.

[3] https://informationsecurity.uibk.ac.at/pdfs/BHB2022_IHMMSEC....

[4] PhotoLine

axiolite 6 hours ago

It just blurs out the details. I'd rather have a sharp image with artifacts.

crazygringo 5 hours ago

Why?

You're not seeing the actual details either way.

The blurred version feels honest -- it's not showing you anything more than what has been encoded.

The sharp image feels confusing -- it's showing you a ton of detail that is totally wrong. "Detail" that wasn't in the original, but is just artifacts.

Why would you prefer distracting artifacts over a blurred version?

pxndxx 5 hours ago

The details were destroyed long ago by the poor compression, you aren't getting them back either way.

adgjlsfhk1 9 hours ago

eh, it is bad tech. modern compression algorithms hide the blocks a lot more because blocking is the most visible artifact

pornel 7 hours ago

It's a perfectly pragmatic engineering choice. Blocking is visible only when the compression is too heavy. When degradation is imperceptible, then the block edges are imperceptible too, and the problem doesn't need to be solved (in JPEG imperceptible still means 10:1 data size reduction).

Later compression algorithms were focused on video, where the aim was to have good-enough low-quality approximations.

Deblocking is an inelegant hack.

Deblocking hurts high quality compression of still images, because it makes it harder for codecs to precisely reproduce the original image. Blurring removes details that the blocks produced, so the codec has to either disable deblocking or compensate with exaggerated contrast (which is still an approximation). It also adds a dependency across blocks, which complicates the problem from independent per-block computation to finding a global optimum that happens to flip between frequency domain and pixel hacks. It's no longer a neat mathematical transform with a closed-form solution, but a pile of iterative guesswork (or just not taken into account at all, and the codec wins benchmarks on PSNR, looks good in side by side comparisons at 10% quality level, but is an auto-airbrushing texture-destroying annoyance when used for real images).

The Daala project tried to reinvent it with better mathematical foundations (lapped transforms), but in the end a post-processing pass of blurring the pixels has won.

adgjlsfhk1 5 hours ago

The really big problem with blocking is that it introduces very visible artifacts in dark backgrounds and that they're a type of artifact that draws your attention to them. Part of the problem here is that 8 bit SRGB isn't quite sufficient to prevent visible banding without dithering in dark regions, so whne you add blocking artifacts to already slightly visible banding the result turns into a jagged attention grabbing mess.

Deblocking is inelegant but blur is a much less noticeable artifact than blocks. That said the best answer turns out to be having the input image in 10 bit, and having encoders/decoders work at higher internal bitrates which allows for the encoder to make smarter choices about what detail is real, gives the decoder some info from which it can more intelligently dither the decoded image.

IIUC AV2 is trying to resurrect the Daala deblocking work. I think Jpeg-xl also has some good stuff here (but I don't remember exactly what)

fleabitdev 6 hours ago

I only recently learned that JPEG and MPEG-1 were designed for near-lossless compression, so the massive bitrate reductions which came further down the road had nothing to do with the original design.

"Inelegant" is the right word; it's hard to shake off the feeling that we might have missed something important. I suspect the next big breakthrough might be waiting for researchers to focus on lower-quality compression specifically, rather than requiring every new codec to improve the state of the art in near-lossless compression.

greenavocado 6 hours ago

> for researchers to focus on lower-quality compression specifically

JPEG-XL already does this because it uses VarDCT (Variable-size Discrete Cosine Transform) aka adaptive block sizes (2×2 up to 256×256). Large smooth areas use huge blocks and fine detail uses small blocks to preserve detail. JXL spends bits where your eyes care most instead of evenly across the image. It also has many techniques it uses to really focus on keeping edges sharp.

fleabitdev 5 hours ago

JPEG XL achieves about half the bitrate of an equal-quality JPEG, even at lower quality levels. That's a real achievement, but the complexity cost is high; I'd estimate that JPEG XL decoders are at least ten times more complex than JPEG decoders. Modern lossy image codecs are "JPEG, with three decades of patch notes" :-)

I think we're badly in need of an entirely new image compression technique; the block-based DCT has serious flaws, such as its high coding cost for edges and its tendency to create block artefacts. The modern hardware landscape is quite different from 1992, so it's plausible that the original researchers might have missed something important, all those years ago.

bambax 13 hours ago

So the idea behind JPEG is the same as behind MP3: we filter out what we don't perceive.

I wonder if other species would look at our images or listen to our sounds and register with horror all the gaping holes everywhere.

thih9 12 hours ago

About: "I wonder if other species would look at our images or listen to our sounds and register with horror all the gaping holes everywhere.", yes.

In particular, dogs:

> While people have an image frame rate of around 15-20 images per second to make moving pictures appear seamless, canine vision means that dogs need a frame rate of about 70 images per second to perceive a moving image clearly.

> This means that for most of television’s existence – when they are powered by catheode ray tubes – dogs couldn’t recognize themselves reliably on a TV screen, meaning your pups mostly missed out on Wishbone, Eddie from Fraisier and Full House’s Comet.

> With new HDTVs, however, it’s possible that they can recognize other dogs onscreen.

Source: https://dogoday.com/2018/08/30/dog-vision-can-allow-recogniz...

ulfw 12 hours ago

What does frame rate have to do with being able to recognise a creature?

If I watch a video in 10fps it looks shite but I still recognise everything on screen

drysart 12 hours ago

It's about being able to perceive it as a "living" moving creature and not something different.

You can understand something below the perception threshold is supposed to be a creature because you both have a far more advanced brain and you've been exposed to such things your entire life so there's a learned component; but your dog may simply not be capable of making the leap in comprehending that something it doesn't see as living/moving is supposed to be representative of a creature at all.

I've personally seen something adjacent to this in action, as I had a dog over the period of time where I transitioned from lower framerate displays to higher framerate displays. The dog was never all that interested in the lower framerate displays, but the higher framerate displays would clearly capture his attention to the point he'd start barking at it when there were dogs on screen.

This is also pretty evident in simple popular culture. The myth that "dogs can't see 2D" where 2D was a standin for movies and often television was pervasive decades ago. So much so that (as an example) in the movie Turner and Hooch from 1989, Tom Hanks offhandedly makes a remark about how the dog isn't enjoying a movie because "dogs can't see 2D" and no further elaboration on it is needed or given; whereas today it's far more common to see content where dogs react to something being shown on a screen, and if you're under, say, 30 or so, you may not have ever even heard of "dogs can't see 2D".

harrall 6 hours ago

But how can you tell if it was lack of recognition?

I wouldn’t be interested in a 2 FPS video either — in fact I would probably avoid looking at it, whether I recognized the humans in it or not.

maverwa 11 hours ago

With CRTs I would think that the problem may be that they do not see a full picture at all. Because the full screen is never lit all at once? Don’t know how persistence of vision works in this case…

afiori 11 hours ago

With Cathode ray TVs only a single pixel at a time is on, it relies on our eyes having bad enough temporal resolution, if you have Superspeed eyes you will see just a coloured line/pixel moving on screen

kevincox 6 hours ago

That's not quite true. Only one pixel is being activated at a time but the phosphors continue to emit light for many pixels. In practice you get a handful of lines lit to varying degrees at at time. Maybe 1-2 lines quite brightly lit and then a trail of lines that are fading pretty significantly (but still emitting light). They yes, our persistence of vision fills in the rest to provide the appearance of a fully lit screen.

This video has some great slow-mo of CRTs https://www.youtube.com/watch?v=3BJU2drrtCM&t=160s

__alexs 12 hours ago

> While people have an image frame rate of around 15-20 images per second to make moving pictures appear seamless,

This is just...wrong? Human vision is much fast and more sensitive than we give it credit for. e.g. Humans can discern PWM frequencies up to many thousands of Hz. https://www.youtube.com/watch?v=Sb_7uN7sfTw

nandomrumber 12 hours ago

NO YOU ARE!

> make moving pictures appear seamless

True enough.

NTSC is 30fps, while PAL is 25fps.

The overwhelming majority of people were happy enough to spend, what, billions on screens and displays capable of displaying motion picture in those formats.

That there is evidence that most(?) people are able to sense high frequency PWM signals doesn’t make the claim that 15 to 20 frames per second is sufficient to make moving pictures appear seamless.

I’ve walked in to rooms where the LED lighting looks fine to me, and the person I was with has stopped, said “nope” and turned around and walked out, because to them the PWM driver LED lighting makes the room look illuminated by night club strobe lighting.

That doesn’t invalidate my experience.

toast0 9 hours ago

> NTSC is 30fps, while PAL is 25fps.

That's not really right. Most NTSC content is either 60 fields per second with independent fields (video camera sourced) or 24 frames per second with 3:2 pulldown (film sourced). It's pretty rare to have content that's actually 30 frames per second broken into even and odd fields. Early video game systems ran essentially 60p @ half the lines; they would put out all even or all odd fields, so there wasn't interlacing.

If you deinterlace 60i content with a lot of motion to 30p by just combining two adjacent fields, it typically looks awful, because each field is an independent sample. Works fine enough with low motion though.

PAL is similar, although 24 fps films were often shown at 25 fps to avoid jitter of showing most frames as two fields but two frames per second as three fields.

I think most people find 24 fps film motion acceptable (although classical film projection generally shows each frame two or three times, so it's 48/72 Hz with updates at 24 fps), but a lot of people can tell a difference between 'film look' and 'tv look' at 50/60 fields (or frames) per second.

cubefox 9 hours ago

Any idea why movies are still mostly at 24 FPS? Is it just because people became used to it?

toast0 9 hours ago

Most (or at least many) people visually recognize 24 fps content as film and higher frame rate content as TV/video.

Filmmakers generally like their films to look like film and high frame rate films are rare and get mixed reviews.

Some TV shows are recorded and presented in 24 fps to appear more cinematic (Stargate: SG1 is an example)

neckro23 5 hours ago

Pretty much all dramatic American TV shows were shot on film (at 24 fps) before the digital camera era. It's why so many old shows (ex. Star Trek TNG) are now available as HD remasters, they simply go back and rescan the film.

It's more complicated in other countries (the BBC liked to shoot on video a lot) but it was standard practice in the States.

cubefox 9 hours ago

That association seems to be an unfortunate equilibrium because higher frame rates seem to be "objectively" better, similar to higher resolution and color. (Someone without prior experience with TV/movies would presumably always prefer a version with higher frame rate.)

ndiddy 7 hours ago

I think familiarity is a major factor, but the lower frame-rate and slower shutter speed also creates motion blur, which makes it easier to make the film look realistic since the details get blurred away. I remember when The Hobbit came out at 48 fps and people were complaining about how the increased clarity made it look obviously fake, like watching a filmed play instead of a movie.

throwaway27448 6 hours ago

> I remember when The Hobbit came out at 48 fps and people were complaining about how the increased clarity made it look obviously fake, like watching a filmed play instead of a movie.

Curiously I can already get in this mindset with 24fps videos and much, much prefer the clarity of motion 48fps offers. All the complaining annoyed me, honestly. It reminds me of people complaining about "not being able to see things in dark scenes" which completely hampers the filmmakers ability to exploit high dynamic range.

Tbf, in both cases the consumer hardware can play a role in making this look bad.

MrDrMcCoy 5 hours ago

I went out of my way to see the Hobbit in 24 and 48 fps when it came out, and weirdly liked 48 better. It was strange to behold, but felt like the sort of thing that would be worth getting used to. What I didn't like was the color grading. They didn't have enough time to get all the new Red tech right, that's for sure.

username_here 7 hours ago

Yeah, that's pretty much it. They standardized on 24 back when sound on film took over Hollywood, and we now have a century of film shot at that speed. It's what "the movies" look like. There have been a few attempts to introduce higher frame rates, like Peter Jackson's The Hobbit and James Cameron's Avatar, both at 48 fps, but audiences by and large don't seem to like the higher frame rates. It doesn’t help that we have nearly a century of NTSC TV at ~60 fps[1], and our cultural memory equates these frame rates with live tv or the "soaps," not the prestige of movies.

[1]Technically 29.97fps but the interlacing gives 59.94 fields per second.

throwaway27448 5 hours ago

I haven't seen a single person complain about avatar. I wonder if the issue with the hobbit wasn't the 48fps at all but rather something more akin to when we shifted to HD and makeup/costume artists had to be more careful.

pwg 8 hours ago

Because movies (in film form) are projected an entire frame at a time instead of scanned a line (well, actually a dot moving in a line) at a time onto the screen. I read somewhere (but no longer have the link) that when projecting the entire frame at once as film projectors do lower frame rates are not as noticeable. I do not know if modern digital projectors continue to project "whole frames at once" on screen.

__alexs 7 hours ago

Movies are not projected using the scan and hold approach used by typical computer displays. They have a rotating shutter which blinks every frame at you multiple times. This both helps to hide the advance to the next frame but also greatly increases motion clarity despite the poor framerate.

cubefox 2 hours ago

But blinking a frame multiple times rather than once creates a double (or triple etc) image effect. To get optimal motion clarity which compensates Smooth Pursuit without double images, one would need to flash each frame once, as short as possible. But that's not feasible for 24 FPS because it would lead to intense flickering. It would be possible for higher frame rates though.

__alexs 7 hours ago

Even a person with below average eye sight can easily see 24 fps judder in fast moving pans. Vision is not nearly as simple as people want it to be.

zacmps 12 hours ago

Badly phrased but not wrong, this is the minimum frame rate for humans to perceive motion as supposed to a slide show of images.

The maximum frame rate we can perceive is much higher, for regular video it's probably somewhere around 400-800.

Gigachad 9 hours ago

Maximum depends on what it is you are seeing. If it’s a white screen with a single frame of black, you can see that at incredibly high frame rates. But if you took a 400fps and a 450fps video, I don’t think you would be able to pick which is which.

Gitechnolo 9 hours ago

The discussion on flicker fusion frequency (FFF) and human vs. canine perception is fascinating. When building systems that synchronize with human physiology, like the metabolic digital twins I'm currently developing, we often find that 'perceived' seamlessness is highly variable based on cognitive load and environmental light.

While 24-30fps might suffice for basic motion, the biological impact of refresh rates on eye strain (especially for neurodivergent users) is a real engineering challenge. This is why I've been pushing for WCAG 2.1 AAA standards in my latest project; it’s not just about 'seeing' the image, but about minimizing the neurological stress of the interaction itself.

bsjshshsb 12 hours ago

I wonder how dogs get on with RGB presentation?

xoxxala 11 hours ago

Dogs can see some colors, but not as many as humans. They have dichromatic vision, and see shades of gray, brown, yellow and blue. Red and green are particularly bad colors for them.

We get blue tennis balls for our pups instead of green; but they aren’t the fetching kind so not sure if it helps.

Simran-B 13 hours ago

My dog doesn't react to familiar voices over the phone at all. The compression and reproduction of audio, while fine for humans, definitely doesn't work for her animal ears.

autoexec 10 hours ago

Have you tried it with uncompressed audio? Have all the times when your dog could recognize your voice also been times when you were within smelling range?

toast0 9 hours ago

It's pretty hard to avoid uncompressed audio. Even if it's PCM, there's almost always a lowpass filter, either explicitly in the input/output processing, by the sampling rate, or from the physical limits of the mic and speaker.

Everything is tuned for human audible range, so dogs will miss out on the higher frequency stuff. Humans did ok with POTS@8kHz with a 300-3400Hz band pass filter though. The internet says dog hearing goes up to ~ 60 kHz; most audio equipment tuned for humans won't go anywhere near that, but probably cleanly carying high frequency up to the limit of the equipment would be better than psychoacoustic compression tuned for humans.

Gigachad 9 hours ago

Could also be the speaker and volume.

masklinn 13 hours ago

Gaping holes seems unlikely, more loss of detail or shifted colors.

You can experience something like that by using plugins which simulate CVD / color blindnesses.

AloysisFrancis 12 hours ago

thanks for sharing informaion

danwills 13 hours ago

Having played a bit with Discrete FFT (with FFTW on 2D images in a Shake plugin we made at work ages ago) makes the DCT coefficients make so much more sense! I really wonder whether the frequency-decomposition could happen at multiple scale levels though? Sounds slightly like wavelets and maybe that's how jpeg2000 works?.. Yeah I looked it up, uses DWT so it kinda sounds like it! Shame it hasn't taken off so far!? Or maybe there's an even better way?

fleabitdev 12 hours ago

The discrete wavelet transform (DWT) compresses an image by repeatedly downscaling it, and storing the information which was lost during downscaling. Here's an image which has been downscaled twice, with its difference images (residuals): https://commons.wikimedia.org/wiki/File:Jpeg2000_2-level_wav.... To decompress that image, you essentially just 2x-upscale it, and then use the residuals to restore its fine details.

Wavelet compression is better than the block-based DCT for preserving sharp edges and gradients, but worse for preserving fine texture (noise). The DCT can emulate noise by storing just a couple of high-frequency coefficients for a 64-pixel block, but the DWT would need to store dozens of coefficients to achieve noise synthesis of similar quality.

The end result is that JPEG and JPEG 2000 achieve roughly the same lossy compression ratio before image artefacts show up. JPEG blurs edges, JPEG 2000 blurs texture. At very low bitrates, JPEG becomes blocky, and JPEG 2000 looks like a low-resolution image which has been upscaled (because it's hardly storing any residuals at all!)

FFmpeg has a `jpeg2000` codec; if you're interested in image compression, running a manual comparison between JPEG and JPEG 2000 is a worthwhile way to spend an hour or two.

Cold_Miserable 11 hours ago

I found a jpeg2000 reference PDF somewhere. It may as well have been written in Mandarin. I got as far as extracting the width and height. Its much more advanced than jpeg. Forget about writing a decoder.

meindnoch 9 hours ago

https://www.hlevkin.com/hlevkin/04imageprocDoc/David%20S.%20...

cubefox 8 hours ago

What about JPEG XL or AVIF? Do they use DCT or DWT, or perhaps something else?

fleabitdev 7 hours ago

Both formats are DCT-based (except for lossless JPEG XL). JPEG 2000's use of the DWT was unusual; in general, still-image lossy compression research has spent the last 35 years iteratively improving on JPEG's design. This is partly for compatibility reasons, but it's also because the original design was very good.

Since JPEG, improvements have included better lossless compression (entropy coding) of the DCT coefficients; deblocking filters, which blur the image across block boundaries; predicting the contents of DCT blocks from their neighbours, especially prediction of sharp edges; variable DCT block sizes, rather than a fixed 8x8 grid; the ability to compress some DCT blocks more aggressively than others within the same image; encoding colour channels together, rather than splitting them into three completely separate images; and the option to synthesise fake noise in the decoder, since real noise can't be compressed.

You might be interested in this paper: https://arxiv.org/pdf/2506.05987. It's a very approachable summary of JPEG XL, which is roughly the state of the art in still-image compression.

pornel 7 hours ago

Everything after JPEG is still fundamentally the same, but individual parts of the algorithm are supercharged.

JPEG has 8x8 blocks, modern codecs have variable-sized blocks from 4x4 to 128x128.

JPEG has RLE+Huffman, modern codecs have context-adaptive variations of arithmetic coding.

JPEG has a single quality scale for the whole image, modern codecs allow quality to be tweaked in different areas of the image.

JPEG applies block coefficients on top of a single flat color per block (DC coefficient), modern codecs use a "prediction" made by smearing previous couple of block for the starting point.

They're JPEGs with more of everything.

momojo 5 hours ago

I've seen many a JPEG explainer, but this one wins for most aesthetic. The interactive visuals were also nice. My only criticism is the abrupt ending; should have concluded with the "now lets put it all together" slider.

petercooper 11 hours ago

I've been working on a pure Ruby JPEG encoder and a bug led me to an effect I wanted. The output looked just like the "crunchy" JPEGs my 2000-era Kodak digital camera used to put out, but it turns out the encoder wasn't following the zig-zag pattern properly but just going in raster order. I'm now on a quest to figure out if some early digital cameras had similar encoding bugs because their JPEG output was often horrendous compared to what you'd expect for the filesize.

Gigachad 9 hours ago

Quite a lot of hardware devices have broken encoders. I noticed just how inconsistent it is if software will tolerate invalid files or work around it.

Seems these days a there’s more of a preference to outright refuse invalid files since they could be exploit attempts.

petercooper 9 hours ago

The interesting thing about the situation I mentioned is that while the encoding algorithm was wrong, the actual output was valid JPEG that simply didn't look quite visually correct. But you're right, invalid encoding can be a problem too, and I have noticed during testing that a lot of decoders are quite forgiving of it.

Paulo75 9 hours ago

The part about green getting 58.7% weight in the luminance calculation is one of those details that seems arbitrary until you realize it's literally modeled on the density of cone cells in the human retina. The whole algorithm is basically a map of what human eyes can't see.

fuoqi 8 hours ago

>it's literally modeled on the density of cone cells in the human retina

It's related to it, but not "literally modeled" on it. This number is from experiments where people are asked to equalize perceived brightness of two lights with different colors. The results are than averaged out and interpolated using polynomials to create a color model [0].

[0]: https://en.wikipedia.org/wiki/Color_model

meindnoch 13 hours ago

What would happen if the Cr and Cb channels used different chroma subsampling patterns? E.g. Cr would use the 4:2:0 pattern, and Cb would use the 4:1:1 pattern.

pornel 7 hours ago

It is supported in JPEG! It can reduce color bleeding along the axis that each chroma channel controls.

For example, if you make sharp Cr and low-res Cb, you'll get sharper red edges with some yellow bleeding instead of completely blurry red edges if Cr was subsampled.

mdavid626 9 hours ago

JPEG still rules the world. Many new alternatives were developed, none of them really that much better than JPEG.

GuB-42 3 hours ago

There is much better than JPEG, however, because still images are not really a problem in terms of bandwidth and storage, we just use bigger JPEGs if we need more quality. The extra complexity and breaking standards is not worth it.

This is different for video, as video uses a whole lot more bandwidth and storage, it means we are more ready to accept newer standards.

That's where webp comes from, the idea is that images are like single frame videos and that we could use a video codec (webm/VP8) for still images, and it will be more efficient than JPEG.

That's also the reason why JPEG-XL is taking so long to be adopted. Because efficient video codecs are important, browsers want to support webm, and they get webp almost for free. JPEG-XL is an entirely new format just for still images, it is complex and unlike with video, there is no strong need for a better still image format.

yboris 5 hours ago

JPEG XL is a superior replacement for JPEG. It is better in basically every way to JPEG.

jfoster 5 hours ago

Except compatibility, but the biggest gap is browser support, which is in the process of getting closed. Chrome has shipped JXL support behind a flag. Firefox are in the process of implementing support.

In Chrome you can enable JXL from here: chrome://flags/#enable-jxl-image-format

You can track Firefox progress from here: https://bugzilla.mozilla.org/show_bug.cgi?id=1539075

atiedebee 8 hours ago

A lot of them are better in many areas. JPEG is just good enough (tm)

vanderZwan 11 hours ago

This is a really great article, and I really appreciate how it explains the different parts of how JPEG works with so much clarity and interactive visualizations.

However, I do have to give one bit of critique: it also makes my laptop fans spin like crazy even when nothing is happening at all.

Now, this is not intended as a critique of the author. I'm assuming that she used some framework to get the results out quickly, and that there is a bug in how that framework handles events and reactivity. But it would still be nice if whatever causes this issue could be fixed. It would be sad if the website had the same issue on mobile and caused my phone battery to drain quickly when 90% of the time is spent reading text and watching graphics that don't change.

Gitechnolo 9 hours ago

I share this frustration. It's ironic that an article explaining compression and efficiency requires so much client-side overhead.

I've been experimenting with a 'Zero-Framework' approach for a biotech project recently, precisely to avoid this. By sticking to Vanilla JS and native APIs (like Blob for real-time PDF generation), I managed to keep the entire bundle under 20KB with a 0.3s TTI.

We often forget that for users on legacy devices or unstable 3G/Edge connections, a 'heavy' interactive page isn't just slow, it's inaccessible. Simplicity shouldn't just be an aesthetic choice, but a core engineering requirement for global equity.

lelandfe 9 hours ago

It looks like there's some requestAnimationFrame call going on more than once per second. It's definitely an energy intensive tab.

But for reference, keeping CNN.com open is more than double that memory pressure on my 5 year old Mac laptop, and it handles both fine. Do your fans really kick in for heavy sites?

vanderZwan 2 hours ago

I have an 8 year old laptop that works fine except as long as I don't bother with sites like CNN.com. Heck, I even have a 13 year old laptop that works fine on most sites. Absurd ad-tech and tracking technology is not a motivation for me to upgrade but to avoid badly coded sites.

toast0 9 hours ago

https://lite.cnn.com/ is the only version you should load :p

aanet 4 hours ago

Just a lil comment to say that the blog is impressive -- the aesthetics, the explanations, the visuals -- all look great. Kudos.

Zamicol 6 hours ago

Very impressive work. Well done on the blog.

This reminds of of the sort of work Nayuki does: https://www.nayuki.io

Alen_P 10 hours ago

Really enjoyed this. It's easy to forget how much engineering went into JPEG. The explanation of compression and quality tradeoffs was clear without oversimplifying. Impressive how well the format still holds up today. Curious how you think it compares to newer formats like AVIF or WebP in everyday use.

tmilard 13 hours ago

Thanks for the sharing : I now understood more how sampling of image works. And going from RGB to lunimesence+chroma works. interesting and usefull

NooneAtAll3 14 hours ago

> Application error: a client-side exception has occurred (see the browser console for more information).

seems like website doesn't work without webgl enabled... why?

Tempest1981 13 hours ago

It is a nicely done interactive page, if you can get it to work.

NooneAtAll3 10 hours ago

it's a blogpost

there is no need for gpu to show text to read

maxloh 14 hours ago

They use 3D graphs to show the Math.

j2kun 5 hours ago

Beyond the content, I have to say I love the aesthetic vibe of this website.

vmilner 10 hours ago

The weirdest thing to me is that the quantisation matrix isn’t symmetrical in the top left to bottom right diagonal.

jbverschoor 14 hours ago

Application error: a client-side exception has occurred (see the browser console for more information).

greenavocado 6 hours ago

I would love to see this document extended to explain the optional arithmetic coding in JPEG

tehjoker 8 hours ago

Maybe it's because I've read a few pieces on JPEG before so I have some prior knowledge, but I was looking to review this and this presentation was one of the clearest I've seen. Good job!

7777777phil 14 hours ago

Already posted here earlier: https://news.ycombinator.com/item?id=47376119

NSPG911 14 hours ago

The post you linked to was made after this post.

7777777phil 14 hours ago

my bad, I just saw 2 hours ago on this one - either way they could be merged

vinhnx 13 hours ago

I think my submission about this post was selected to "second-chance" pool by HN Moderators. Hence it's being shown again. Thanks for the heads up!

7xgames 6 hours ago

coll info

Babkock 4 hours ago

> Application error: a client-side exception has occurred (see the browser console for more information).

Wtf? I can't read your blog because I use Qutebrowser?

Beatlyze 8 hours ago

Thanks for the sharing

qy-mj 11 hours ago

Wow, that's amazing! How did you do that? Where do you learn it?

tomalaci 14 hours ago

I usually have a script/alias cmd to automatically convert images to webp. The webp format has pretty much replaced jpg/jpeg (lacks transparency/alpha support) and png (no compression) formats for me.

There is also AVIF format which is newer and better but it needs to still mature a bit with better support/compatability.

If you are hosting images it is nice to use avif and fallback to webp.

xp84 13 hours ago

I know it’s more efficient, but It’s too bad webp is basically supported in browsers and nowhere else. I don’t think any OS even makes a thumbnail for the icon! Forget opening it in an image editor, etc. And any site that wants you to upload something (e.g. an avatar) won’t accept it. So, webp seems in practice to be like a lossy compression layer that makes images into ephemeral content that can’t be reused.

(Yes, I know, I should just make a folder action on Downloads that converts them with some CLI tool, but it makes me sad that this only further degrades their quality.)

miladyincontrol 12 hours ago

The only OS that doesnt as far as I'm aware is windows. And what image editors still have problems? Affinity has supported it for several years, GIMP, lightroom/PS, photopea, everywhere I test webp works fine. All work just fine.

Most social media sites take webp these days no issue, its mostly older oft php-based sites that struggle far as im aware. And when it cuts down bandwidth by a sizeable amount theres network effects that tend to push some level of adoption of more modern image formats.

ledoge 13 hours ago

> png (no compression)

To be clear, PNG only supports lossless compression, while WebP has separate lossy and lossless modes. AVIF can do lossless compression too, but you're usually better off using WebP or PNG (if you need >8 bpc) instead as it really isn't good at that.

vikingerik 6 hours ago

PNG can be lossy before the lossless step. You can take areas of near-matching pixel values and make them actually match, to work better with PNG's near neighbor compression. There are a few encoders that can do that.

sampullman 12 hours ago

There is lossy PNG compression that works very well for images using a limited color palette (pngquant, lossypng, etc).

groundzeros2015 8 hours ago

With how inaccessible webp is I’m surprised it doesn’t come with some DRM.

I’m sure Google has stats about “right click save as”

lifthrasiir 14 hours ago

It is not that trivial, because there are tons of existing JPEG files and lossy recompression costs quality. (PNG does get replaced primarily because lossless WebP is kinda a superset of what PNG internally does.)

krick 12 hours ago

On your last point: I find it super annoying when both lossy and lossless codecs have the same name, and, more importantly, file extension. I get it that internally they are "almost the same thing", just with one extra step of discarding low-impact values, but when I see a PNG/FLAC file I know, that if the file was handled properly and wasn't produced by Windows clipboard or something, it is supposed to represent exactly the original data. When I see JPEG/MP3, I know that whatever it went through, it is not the original data for sure. When I see WEBP, I assume it's lossy, because it's just how it's used, and I cannot just convert all my PNG files to a newer format, because after that I won't be able to tell (easily) which is the original and which is not.

lifthrasiir 12 hours ago

Ah, in that case you would be more annoyed to learn that lossy WebP and lossless WebP are completely distinct. They only share the container format and their type codes are different.

krick 12 hours ago

Awesome. It would be interesting to learn why they even thought it is a good idea. Content-agnostic containers may make sense for video, but for the vast majority of use-cases a "video" is in fact a complex bundle of (several) video, audio, metadata tracks, so the "container" does much heavier lifting than specifying codec details and some metadata.

jmalicki 9 hours ago

Why would they want different container formats? No point in having multiple different metadata specs.

Zardoz84 13 hours ago

Just re-encode it to Jpeg XL without loss of quality, and use less space.

adzm 8 hours ago

This is probably the neatest feature of JPEG XL. Although, creating a thumbnail by literally truncating the image bytestream is a close second in 'neat' factor.