Unsloth GLM-5.2 – How to Run Locally
46 points by TechTechTech 2 hours ago | 14 comments

xrd 41 minutes ago
So close! My machine with 192GB RAM + RTX 3090 24GB can almost run this. It says it needs 24GB of VRAM and 256GB of RAM for MoE offloading.

https://unsloth.ai/docs/models/glm-5.2#usage-guide

In a prior thread, someone said it would take $500k in hardware:

https://news.ycombinator.com/item?id=48629970

reply
cheema33 25 minutes ago
I have the RAM, but not the VRAM. What kind of speed/tps could you expect from a 3090 with 24GBs of RAM? I am somewhat tempted to pick a GPU with 24GBs of RAM.
reply
mgambati 37 minutes ago
With 2 wouldn’t have good results. Ideal range for coding is at least Q8.
reply
kibibu 30 minutes ago
According to this very article, 4-bit dynamic is essentially lossless
reply
pheggs 24 minutes ago
I feel like the gap is closing to be able to run good enough models locally even for coding and I would assume it could make some companies a bit nervous. Am I wrong about that?
reply
fny 23 minutes ago
The RAM requirements are still pretty painful.
reply
yieldcrv 4 minutes ago
equilibrium in one or two more years on the consumer/prosumer side

think Apple M6 or M7 with a currently unforeseen denser memory style, 256gb RAM

a couple inference or cache improvements on the algorithmic side, using less ram for context windows and doubling token speed again

denser open source models, packing more experts for smaller active layers

it'll still be expensive but like $8,000 - $13,000 instead of $450,000 worth of B200s

reply
cogman10 14 minutes ago
I don't think so. I could easily see a company deciding to host and run these models for their own development. If you have a dev team of about 10 people, a one time $50k investment in an LLM server has to be pretty tempting. Unlimited tokens, decent performance, upgrade options, and potential product integrations.

For companies wanting LLMs in their products in general, I have to think going the local llm route is even more tempting. Somewhat dumb models are more than good enough for a lot of the things people are integrating LLMs into their products.

reply
CamouflagedKiwi 19 minutes ago
The hardware requirements to run this locally are still very high. Seems far enough off mainstream for those companies not to be too worried yet.
reply
zuzululu 25 minutes ago
wonder if AMD's new ai chip can run this with ease? I'm seriously consider buying it. GLM 5.2 is just shy of GPT 5.4 so I would welcome offloading any grunt work locally

I am very excited for local LLMs I think we may have GPT 5.5-xhigh level of performance for under 2000 EUR

This should put more pressure on the frontier models to avoid sitting on any fancy stuff and lower token prices as a whole.

Nothing beats a local LLM disconnected from the cloud.

reply
benjiro29 9 minutes ago
"GLM 5.2 is just shy of GPT 5.4"... If your running the full model. As in have 750 (FP8) to 1.5TB(FP16) of memory available.

Do not mix the benchmark results of GLM 5.2 FP16/FP8 with FP4 or FP2.

* FP4 will mean a accuracy loss of about 3%. Not noticeable but more chance for mistakes. * FP2 ... what is what most people are able to run at home, for a "reasonable" price. Your looking at over 17% loss in accuracy.

At that point, your running at less then claude-sonnet-4.6, as the issues compound with accuracy losses. And reasonable priced is still in the ~ $5000 range (192GB + GPU 32GB active/kv cache system).

For that price your using a Codex / Claude Pro subscription for the next 4+ years with better models (by default), let alone with a FP2 GLM 5.2 version. And your looking at < 10 fps. A MacStudio with 512GB will net you 18 a 20fps+ with FP4, but ... i mean, those used to be $10.000.

Unfortunately the local hardware cost is a major issue for running large models like that.

reply
nh43215rgb 5 minutes ago
Even with upcoming AI Max+ PRO 495 we are capped with 192GB, so no...
reply
Iolaum 18 minutes ago
At full quantization GLM 5.2 may be close to GPT 5.4. But at Q2 or whatever one needs in order to run it on a pro-sumer device it will be worse.

Also I m not sure where you are getting the under 2k value. I bought a Framework desktop 128GB last year and my setup was around 2.7k. The same setup now sells for around 4.7k.

reply
kccqzy 14 minutes ago
The AMD 395 chip supports up to 128GB unified RAM. So still not enough even at 1-bit quant unfortunately.
reply