The game benchmarks are fun but the LLM improvements are where this gets really interesting for practical use. I love Apple platforms as an approachable way to run local models with a lot of RAM, but their relatively slow prompt processing speed is often overlooked.
> Here you can see the big issue with Macs: the prompt processing (aka “prefill”) speed. It just gets worse and worse, the longer the prompt gets. At a 4K-token prompt, which doesn’t seem very long, it takes 17 seconds for the M4 MacBook Air to parse before we even start generating a response. Meanwhile, if you strap the eGPU to it, it’ll only take 150ms. It’s 120x faster.
The prefill problem goes unnoticed when you’re playing around with the LLM with small chats. When you start trying to use it for bigger work pieces the compute limit becomes a bottleneck.
The time to first token (TTFT) charts don’t look bad until you notice that they had to be shown on a logarithmic scale because the Mac platforms were so much slower than full GPU compute.
TTFT is way better on M5 although still nowhere near a 5090 of course. I am not sure how mature the software stack is.
You can see in the article’s charts that a M5 Pro significantly beats a M4 Max (while in other workloads it would be much smaller since Pro is ~1/2 Max).
Still nowhere near a 5090 of course, but I’m not sure how mature the software stack they’re using is - my own anecdotal experience when I got a M5 Pro a few weeks ago is only very specific frameworks/models were properly optimized.
The RTX 5090 has an incredible amount of compute performance for matrix operations and a lot of memory bandwidth. The Apple Silicon parts have unusually high memory bandwidth for general purpose compute chips, which is why they can generate tokens so fast. Their raw matrix compute performance is amazing for their power envelope but not nearly as fast as a dedicated GPU consuming 400-500W.
Apple added tensor cores on the M5 generation which help with those matrix operations, which is why the M5 performs so much better than the M4 Max in that article.
Dedicate GPUs like the RTX 5090 are in another league, though.
You can see the divergence in the high resolution gaming benchmarks, too. Once he starts benchmarking at 4K or 6K where the CPU emulation stops being a bottleneck, the raw compute of the 5090 completely crushes any of the Apple Silicon GPUs.
because the GPUs aren't as fantastic as everyone assumes?
> might also be less optimised in MLX?
prefill has gotta be one of the most optimized paths in MLX...
Or, more likely, it will tell you something it doesn't know.
Reminds me of yesterday, when I was arguing with ChatGPT that the 5070TI was an actual video card. It kept trying to correct me by saying I must have meant a 4070ti, since no such 5070ti card exists.
I asked Claude to generate an HTML page about PowerShell 7. It gave me a page saying 7.4 was the latest LTS release. I corrected it with links showing 7.6 was released in March and asked it to regenerate with the latest information.
It generated basically the same page with the same claim that 7.4 was the latest release.
People do this too though. At least the AI generally tries to follow instructions that you give it even when you are lacking clarity in the details.
I feel like it's similar to the self-driving car problem. The car could have 99.9999% reliability, drive much better and safer than a human, yet folks will still freak out about a single mistake that's made even though you have actual humans today driving the wrong way down the highway, crashing in to buildings, drunk driving, stealing cars, and all sorts of other just absolutely stupid things.
We need to move away from this idea that because it's an AI system it should give you perfect responses. It's not a deterministic system and it can be wrong, though it should get better over time. Your Google search results are wrong all the time too. The NYT writes things that are factually incorrect. Why do we have such a high standard for these models when we don't apply them elsewhere?
This is also very bad and people complain about these things all the fucking time.
> Why do we have such a high standard for these models
Because Altman and Amodei are defrauding investors out of hundreds of billions of dollars on the promise that they will replace the entire workforce. Of course people are going to point out the emperor has no clothes when half of our society is engaged in mass hysteria worshipping these fucking things as the next industrial revolution, diverting massive amounts of resources to them, and ruining HN with 10 articles on the front page per day about how software engineering is dead.
So at worst these AI tools are as bad as the existing system. Worth complaining about? Absolutely. Worth holding to much higher standards? Nah I don't think so. Not at this stage at least. And folks are just disappointing themselves by setting up straw men expectations.
These tools are non-deterministic systems (like humans) which sometimes don't do exactly what you want (like humans) but are also extremely fast, much cheaper (for now), and have domain knowledge generation that is much broader than any single human has. Like anything else, there are pros and cons.
it should be reasonably expected that you can give a source and fix an error in the AI output.
I would even go as far as to say if a human directly told the AI "no, use 7.6 as the latest version", the AI should absolutely follow direct instructions no matter what it thinks is true. What if this human was working on a slide about the upcoming release of 7.6 that has no public documentation?
"Very deep", "border-line impractical" "in a research-sense" is the perfect summary of this article itself! :)
> Important: Codex CLI no longer exists
> OpenAI discontinued the Codex model + CLI a while back. There is no official binary named codex in any current OpenAI npm packages. OpenAI’s current CLI tool is:
npm install -g openai
> which installs the openai command, not codex.The world knowledge of these models is not necessarily up to date :)
edit: I replayed the same prompt into current ChatGPT and it is less clueless now. Maybe OpenAI noticed that it was utterly dumb that GPT-5.whatever didn't believe that Codex existed and fine-tuned it.
It's amazing how this still needs to be said. Codex was released in April 2025. The initial GPT-5 and 5.1 still had a knowledge cutoff in late 2024. Like, what did you expect? Always beware the knowledge cutoff for LLMs (although recent releases have gotten much better with researching the web for updates before answering modern software topics).
(EDIT: Apple agrees with my impression. “To use an eGPU, a Mac with an Intel processor is required.” And, on top of that, the officially supported eGPUs were all AMD not NVIDIA. https://support.apple.com/en-us/102363)
Hopefully in 2026 the Valve Index VR headset which is ARM (Qualcomm?) we get what you're talking about here - basically proton for Win32/64 to Linux ARM64.
Side note that Windows on ARM isn't bad just that its priced out of its league and cooling is awful for gaming on current laptops. The only issue I had was OpenGL needing some obscure GL on DirectX thing for Maya3D to get games to work.
But Valve's ARM efforts even mean that Android devices can play some (mostly less graphically intensive) Steam games. That makes me very excited about the prospects for the future of gaming handhelds.
"no - not in any practical sense today, and "maybe" only in a very deep, borderline-impractical research sense."
This is why humans will always rule over crappy LLMs.
Or if you're referring to how the OP still decided to go ahead, I've seen AIs go ahead on impractical courses of action many times, and surprisingly succeed on some of them.
Congrats! Each one got what they wanted :).
Unfortunately, I also believe that market forces may push away from this direction, as LLM companies try to capture the value stream
Never let an AI tell you that you cannot do something practical for your own self for research, discovery or for fun.
The only thing that is close to impractical is expecting your non-technical friends or others to follow you without any incentive or benefit.
It’s these people, not the ones who refuse to use LLMs, who are as they say, “cooked”.
Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!
What exactly do you feel like macOS is missing?
Will Apple ever make a computer that makes Siracusa happy? (and do you have the "Believe" shirt?)
Even if the drivers loaded, they can't talk to the GPU from within docker (unless one implements PCI passthrough). MacOS owns the PCI bus in this scenario.
Anyway, the Mac Pro is dead now. There's only so much sales audio and video professionals can provide.
I don’t know about that. Apple supported some full size GPUs in past product lines and the number of users was very small. Granted, LLMs change that demand but the audience for Mac Pro buyers who would use a full-size GPU that is impossible to obtain is almost nothing compared to their laptop sales.
Part of the reason the new Mac Pro failed to find an audience can definitely be blamed on macOS' hostility to third party hardware. Who knows what Apple would be worth if they beat Nvidia's Grace CPU to the datacenter market. It was certainly their opportunity.