Hacker News

40 points by matt_d 11 hours ago | 8 comments

Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.

https://sebastianraschka.com/llm-architecture-gallery/?compa...

If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.

alecco 38 seconds ago

Yeah, not a great apples-to-apples comparison. But Nemotron-3 is a bit complicated, but it is near SOTA, but it has way less complex attention relative to GLM 5.2 (Mamba + a few GQA).

I think the point stands: MoE, a myriad of complex attention approaches, shared layers, you name it. And making it all work together well is a huge trial-and-error pain even for small models, never mind getting to efficient hardware utilization.

lproven 11 minutes ago

> If you look at it, the diagrams are very similar,

The page links to the same site you do. No wonder it is similar -- the source is the same!

charcircuit 10 minutes ago

The source is the same in the original article too. He is using a different diagram from the same site on the right to justify his point on how much more complicated things have become.

christopherwxyz 26 minutes ago

It’s written by AI.

jddj 3 minutes ago

Highly doubtful

lproven 12 minutes ago

[[citation needed]]

I am a professional writer and have been for over 30 years. (I do not use any form of LLM ever.) This means I read a lot. This also means that I have 30+ years of experience of readers not understanding what I wrote, or not getting further than the title, or not getting the main message, or inverting it in their heads, or inserting their own message and then complaining when I diverge, and an endless list of Ways People Do Not Get It.

I am also a trained TESOL teacher. Ability to capture gist is a skill we test for and measure, and many, maybe the majority, of native speakers don't have it and don't know.

In recent years I constantly see people going "this is written by AI" and I have yet to see a single of of them able to coherently prove their point. It's all just feelings and hunches.

So I am calling you on this:

How do you know? Show your working. Demonstrate your case.

alecco 11 minutes ago

Grammarly and GPTZero say 0% AI.