This tool uses more clever math to replace what's missing: https://github.com/victorvde/jpeg2png
The problem with that approach however is that DCT-scaling is block-based, so for classic 4:2:0 subsampling, each 16x16 chroma block in the original image is now individually being downscaled to 8x8, and perhaps more importantly, later-on individually being upscaled back to 16x16 on decompression.
Compared to classic image resizing algorithms (bilinear scaling or whatever), this block-based upscaling can and does introduce additional visual artefacts at the block boundaries, which, while somewhat subtle, are still large enough to be actually borderline visible even when not quite pixel-peeping. ([3] notes that the visual differences between libjpeg 6b/turbo and libjpeg 7-9 on image decompression are indeed of a borderline visible magnitude.)
I stumbled across this detail after having finally upgraded my image editing software [4] from the old freebie version I'd been using for years (it was included with a computer magazine at some point) to its current incarnation, which came with a libjpeg version upgrade under the hood. Not long afterwards I noticed that for quite a few images, the new version introduced some additional blockiness when decoding JPEG images (also subsequently exacerbated by some particular post-processing steps I was doing on those images), and then I somehow stumbled across this article [3] which noted the change in chroma subsampling and provided the crucial clue to this riddle.
Thankfully, the developers of that image editor were (still are) very friendly and responsive and actually agreed to switch out the jpeg library to libjpeg-turbo, thereby resolving that issue. Likewise, luckily few other programs and operating systems seem to actually use modern libjpeg, usually preferring libjpeg-turbo or something else that continues using regular image scaling algorithms for chroma subsampling.
[1] Instead of libjpeg-turbo or whatever else is around these days.
[2] Which might be true in theory, but I tried de- and recompressing images in a loop with both libjpeg 6b and 9e, and didn't find a significant difference in the number of iterations required until the image converged to a stable compression result.
[3] https://informationsecurity.uibk.ac.at/pdfs/BHB2022_IHMMSEC....
[4] PhotoLine
You're not seeing the actual details either way.
The blurred version feels honest -- it's not showing you anything more than what has been encoded.
The sharp image feels confusing -- it's showing you a ton of detail that is totally wrong. "Detail" that wasn't in the original, but is just artifacts.
Why would you prefer distracting artifacts over a blurred version?
Later compression algorithms were focused on video, where the aim was to have good-enough low-quality approximations.
Deblocking is an inelegant hack.
Deblocking hurts high quality compression of still images, because it makes it harder for codecs to precisely reproduce the original image. Blurring removes details that the blocks produced, so the codec has to either disable deblocking or compensate with exaggerated contrast (which is still an approximation). It also adds a dependency across blocks, which complicates the problem from independent per-block computation to finding a global optimum that happens to flip between frequency domain and pixel hacks. It's no longer a neat mathematical transform with a closed-form solution, but a pile of iterative guesswork (or just not taken into account at all, and the codec wins benchmarks on PSNR, looks good in side by side comparisons at 10% quality level, but is an auto-airbrushing texture-destroying annoyance when used for real images).
The Daala project tried to reinvent it with better mathematical foundations (lapped transforms), but in the end a post-processing pass of blurring the pixels has won.
Deblocking is inelegant but blur is a much less noticeable artifact than blocks. That said the best answer turns out to be having the input image in 10 bit, and having encoders/decoders work at higher internal bitrates which allows for the encoder to make smarter choices about what detail is real, gives the decoder some info from which it can more intelligently dither the decoded image.
IIUC AV2 is trying to resurrect the Daala deblocking work. I think Jpeg-xl also has some good stuff here (but I don't remember exactly what)
"Inelegant" is the right word; it's hard to shake off the feeling that we might have missed something important. I suspect the next big breakthrough might be waiting for researchers to focus on lower-quality compression specifically, rather than requiring every new codec to improve the state of the art in near-lossless compression.
JPEG-XL already does this because it uses VarDCT (Variable-size Discrete Cosine Transform) aka adaptive block sizes (2×2 up to 256×256). Large smooth areas use huge blocks and fine detail uses small blocks to preserve detail. JXL spends bits where your eyes care most instead of evenly across the image. It also has many techniques it uses to really focus on keeping edges sharp.
I think we're badly in need of an entirely new image compression technique; the block-based DCT has serious flaws, such as its high coding cost for edges and its tendency to create block artefacts. The modern hardware landscape is quite different from 1992, so it's plausible that the original researchers might have missed something important, all those years ago.
I wonder if other species would look at our images or listen to our sounds and register with horror all the gaping holes everywhere.
In particular, dogs:
> While people have an image frame rate of around 15-20 images per second to make moving pictures appear seamless, canine vision means that dogs need a frame rate of about 70 images per second to perceive a moving image clearly.
> This means that for most of television’s existence – when they are powered by catheode ray tubes – dogs couldn’t recognize themselves reliably on a TV screen, meaning your pups mostly missed out on Wishbone, Eddie from Fraisier and Full House’s Comet.
> With new HDTVs, however, it’s possible that they can recognize other dogs onscreen.
Source: https://dogoday.com/2018/08/30/dog-vision-can-allow-recogniz...
If I watch a video in 10fps it looks shite but I still recognise everything on screen
You can understand something below the perception threshold is supposed to be a creature because you both have a far more advanced brain and you've been exposed to such things your entire life so there's a learned component; but your dog may simply not be capable of making the leap in comprehending that something it doesn't see as living/moving is supposed to be representative of a creature at all.
I've personally seen something adjacent to this in action, as I had a dog over the period of time where I transitioned from lower framerate displays to higher framerate displays. The dog was never all that interested in the lower framerate displays, but the higher framerate displays would clearly capture his attention to the point he'd start barking at it when there were dogs on screen.
This is also pretty evident in simple popular culture. The myth that "dogs can't see 2D" where 2D was a standin for movies and often television was pervasive decades ago. So much so that (as an example) in the movie Turner and Hooch from 1989, Tom Hanks offhandedly makes a remark about how the dog isn't enjoying a movie because "dogs can't see 2D" and no further elaboration on it is needed or given; whereas today it's far more common to see content where dogs react to something being shown on a screen, and if you're under, say, 30 or so, you may not have ever even heard of "dogs can't see 2D".
This video has some great slow-mo of CRTs https://www.youtube.com/watch?v=3BJU2drrtCM&t=160s
This is just...wrong? Human vision is much fast and more sensitive than we give it credit for. e.g. Humans can discern PWM frequencies up to many thousands of Hz. https://www.youtube.com/watch?v=Sb_7uN7sfTw
> make moving pictures appear seamless
True enough.
NTSC is 30fps, while PAL is 25fps.
The overwhelming majority of people were happy enough to spend, what, billions on screens and displays capable of displaying motion picture in those formats.
That there is evidence that most(?) people are able to sense high frequency PWM signals doesn’t make the claim that 15 to 20 frames per second is sufficient to make moving pictures appear seamless.
I’ve walked in to rooms where the LED lighting looks fine to me, and the person I was with has stopped, said “nope” and turned around and walked out, because to them the PWM driver LED lighting makes the room look illuminated by night club strobe lighting.
That doesn’t invalidate my experience.
That's not really right. Most NTSC content is either 60 fields per second with independent fields (video camera sourced) or 24 frames per second with 3:2 pulldown (film sourced). It's pretty rare to have content that's actually 30 frames per second broken into even and odd fields. Early video game systems ran essentially 60p @ half the lines; they would put out all even or all odd fields, so there wasn't interlacing.
If you deinterlace 60i content with a lot of motion to 30p by just combining two adjacent fields, it typically looks awful, because each field is an independent sample. Works fine enough with low motion though.
PAL is similar, although 24 fps films were often shown at 25 fps to avoid jitter of showing most frames as two fields but two frames per second as three fields.
I think most people find 24 fps film motion acceptable (although classical film projection generally shows each frame two or three times, so it's 48/72 Hz with updates at 24 fps), but a lot of people can tell a difference between 'film look' and 'tv look' at 50/60 fields (or frames) per second.
Filmmakers generally like their films to look like film and high frame rate films are rare and get mixed reviews.
Some TV shows are recorded and presented in 24 fps to appear more cinematic (Stargate: SG1 is an example)
It's more complicated in other countries (the BBC liked to shoot on video a lot) but it was standard practice in the States.
Curiously I can already get in this mindset with 24fps videos and much, much prefer the clarity of motion 48fps offers. All the complaining annoyed me, honestly. It reminds me of people complaining about "not being able to see things in dark scenes" which completely hampers the filmmakers ability to exploit high dynamic range.
Tbf, in both cases the consumer hardware can play a role in making this look bad.
[1]Technically 29.97fps but the interlacing gives 59.94 fields per second.
The maximum frame rate we can perceive is much higher, for regular video it's probably somewhere around 400-800.
While 24-30fps might suffice for basic motion, the biological impact of refresh rates on eye strain (especially for neurodivergent users) is a real engineering challenge. This is why I've been pushing for WCAG 2.1 AAA standards in my latest project; it’s not just about 'seeing' the image, but about minimizing the neurological stress of the interaction itself.
We get blue tennis balls for our pups instead of green; but they aren’t the fetching kind so not sure if it helps.
Everything is tuned for human audible range, so dogs will miss out on the higher frequency stuff. Humans did ok with POTS@8kHz with a 300-3400Hz band pass filter though. The internet says dog hearing goes up to ~ 60 kHz; most audio equipment tuned for humans won't go anywhere near that, but probably cleanly carying high frequency up to the limit of the equipment would be better than psychoacoustic compression tuned for humans.
Wavelet compression is better than the block-based DCT for preserving sharp edges and gradients, but worse for preserving fine texture (noise). The DCT can emulate noise by storing just a couple of high-frequency coefficients for a 64-pixel block, but the DWT would need to store dozens of coefficients to achieve noise synthesis of similar quality.
The end result is that JPEG and JPEG 2000 achieve roughly the same lossy compression ratio before image artefacts show up. JPEG blurs edges, JPEG 2000 blurs texture. At very low bitrates, JPEG becomes blocky, and JPEG 2000 looks like a low-resolution image which has been upscaled (because it's hardly storing any residuals at all!)
FFmpeg has a `jpeg2000` codec; if you're interested in image compression, running a manual comparison between JPEG and JPEG 2000 is a worthwhile way to spend an hour or two.
Since JPEG, improvements have included better lossless compression (entropy coding) of the DCT coefficients; deblocking filters, which blur the image across block boundaries; predicting the contents of DCT blocks from their neighbours, especially prediction of sharp edges; variable DCT block sizes, rather than a fixed 8x8 grid; the ability to compress some DCT blocks more aggressively than others within the same image; encoding colour channels together, rather than splitting them into three completely separate images; and the option to synthesise fake noise in the decoder, since real noise can't be compressed.
You might be interested in this paper: https://arxiv.org/pdf/2506.05987. It's a very approachable summary of JPEG XL, which is roughly the state of the art in still-image compression.
JPEG has 8x8 blocks, modern codecs have variable-sized blocks from 4x4 to 128x128.
JPEG has RLE+Huffman, modern codecs have context-adaptive variations of arithmetic coding.
JPEG has a single quality scale for the whole image, modern codecs allow quality to be tweaked in different areas of the image.
JPEG applies block coefficients on top of a single flat color per block (DC coefficient), modern codecs use a "prediction" made by smearing previous couple of block for the starting point.
They're JPEGs with more of everything.
Seems these days a there’s more of a preference to outright refuse invalid files since they could be exploit attempts.
It's related to it, but not "literally modeled" on it. This number is from experiments where people are asked to equalize perceived brightness of two lights with different colors. The results are than averaged out and interpolated using polynomials to create a color model [0].
This is different for video, as video uses a whole lot more bandwidth and storage, it means we are more ready to accept newer standards.
That's where webp comes from, the idea is that images are like single frame videos and that we could use a video codec (webm/VP8) for still images, and it will be more efficient than JPEG.
That's also the reason why JPEG-XL is taking so long to be adopted. Because efficient video codecs are important, browsers want to support webm, and they get webp almost for free. JPEG-XL is an entirely new format just for still images, it is complex and unlike with video, there is no strong need for a better still image format.
In Chrome you can enable JXL from here: chrome://flags/#enable-jxl-image-format
You can track Firefox progress from here: https://bugzilla.mozilla.org/show_bug.cgi?id=1539075
However, I do have to give one bit of critique: it also makes my laptop fans spin like crazy even when nothing is happening at all.
Now, this is not intended as a critique of the author. I'm assuming that she used some framework to get the results out quickly, and that there is a bug in how that framework handles events and reactivity. But it would still be nice if whatever causes this issue could be fixed. It would be sad if the website had the same issue on mobile and caused my phone battery to drain quickly when 90% of the time is spent reading text and watching graphics that don't change.
I've been experimenting with a 'Zero-Framework' approach for a biotech project recently, precisely to avoid this. By sticking to Vanilla JS and native APIs (like Blob for real-time PDF generation), I managed to keep the entire bundle under 20KB with a 0.3s TTI.
We often forget that for users on legacy devices or unstable 3G/Edge connections, a 'heavy' interactive page isn't just slow, it's inaccessible. Simplicity shouldn't just be an aesthetic choice, but a core engineering requirement for global equity.
But for reference, keeping CNN.com open is more than double that memory pressure on my 5 year old Mac laptop, and it handles both fine. Do your fans really kick in for heavy sites?
This reminds of of the sort of work Nayuki does: https://www.nayuki.io
seems like website doesn't work without webgl enabled... why?
Wtf? I can't read your blog because I use Qutebrowser?
There is also AVIF format which is newer and better but it needs to still mature a bit with better support/compatability.
If you are hosting images it is nice to use avif and fallback to webp.
(Yes, I know, I should just make a folder action on Downloads that converts them with some CLI tool, but it makes me sad that this only further degrades their quality.)
Most social media sites take webp these days no issue, its mostly older oft php-based sites that struggle far as im aware. And when it cuts down bandwidth by a sizeable amount theres network effects that tend to push some level of adoption of more modern image formats.
To be clear, PNG only supports lossless compression, while WebP has separate lossy and lossless modes. AVIF can do lossless compression too, but you're usually better off using WebP or PNG (if you need >8 bpc) instead as it really isn't good at that.
I’m sure Google has stats about “right click save as”
I made a notebook a few years back which lets you play with / filter the DCT coefficients of an image: https://observablehq.com/d/167d8f3368a6d602