4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave
29 points by sabareesh 4 days ago | 28 comments

lgessler 11 minutes ago
Not that this really takes away from the substance of the article, but the first two paragraphs are giving heavy Claude smell. Semicolons, em dashes, "That sequencing matters"... I guess I'm just a little surprised that anyone could be arsed to take on a hardware project like this but can't be arsed to write their own introduction.
reply
iagooar 4 minutes ago
Out of curiosity, what are you training with these cards?
reply
tomaytotomato 40 minutes ago
What a time to be alive, I remember 10 years ago as a poor student waiting to buy a ATI Radeon X1600 Pro with 256mb, yes 256mb of RAM.

It cost about £190 in 2006.

Now we have GPUs that are in tens of thousands of pounds with insane performance, but what would their price be without the AI and Datacentre squeeze?

reply
a012 27 minutes ago
2006 is 20 years ago
reply
testing22321 22 minutes ago
I remember buying the Radeon PCI with 32MB RAM for $650AUD…
reply
NwtnsMthd 22 minutes ago
It's difficult to speculate as to the exact failure from blurry pictures but the solder on that choke (inductor) looks terrible.

Something went wrong in manufacturing. The solder should have wicked to cover the entire pad, not just a small square, and there should be no (brown) discoloration.

reply
amluto 2 hours ago
I wonder whether those cards ran the model that wrote the nonsense about the forces involved.

Hint: when you have a piece of metal stuck with thermal goop to a lot of components, the force doesn’t “concentrate” on one of them. You need to detach it from each one with however much force is needed to detach it from that component.

reply
sabareesh 49 minutes ago
Not sure what really happened but some force or bad solder caused it.
reply
cogman10 38 minutes ago
Ok, how are people powering these things? 2.4kW is well beyond a standard circuit in the US. Are people having 240V/30A circuits installed? Are they hijacking the dryer plugs? EV charger plugs? Hottub circuits?
reply
Kirby64 28 minutes ago
240V-20A circuits will handle 3.8kW continuous. It’s probably a 240V-20A circuit, as that is what the power supplies typically want. Also, easy to convert an outlet to 240V, if the breaker is dedicated to that outlet. Just requires swapping the breaker and the outlet, not the wires.
reply
hgoel 28 minutes ago
Chaining two PSUs on separate circuits is also an option. If they're using the MaxQ versions though, the total GPU power draw is only ~1200W. The bigger question to me is how are they cooling it? Sticking an AC in that room just doubles the power draw issues.
reply
sabareesh 28 minutes ago
It is basically on 2 different circuits/breakers. Asus wrx90e supports 2 psu as well. You may need to synchronize both psu and several adapter for this is available in Amazon. Soon planning to upgrade it to 240V
reply
tjwebbnorfolk 32 minutes ago
exactly, I had a 220v 30a circuit installed to run a multi-GPU server in my basement.

I'm air cooling so I set -pl 450 so I'm not running them all at the full 600w

reply
josephg 2 hours ago
Cool post. FYI you might be better off getting one big fan for your "radiator" instead of lots of little fans. Big fans don't need to spin as fast as small fans to push the same amount of air. So they run a lot quieter.
reply
sabareesh 52 minutes ago
Sure 140mm fans you may call little but it does need enough static pressure for the radiators. This setup is already several times quieter than stock setup
reply
voidUpdate 2 hours ago
Is that little computer training LLMs from scratch all by itself? That must take years to get any kind of progress, given the scale of training other providers do. Where do you get the training data from?
reply
sabareesh 50 minutes ago
Most of the training i am working on is with post training. You can do so much with a system that is running 24/7
reply
atemerev 56 minutes ago
You can train TinyStories in a few hours on retail hardware, and this is a highly illuminating experience that I can recommend for everyone.
reply
sieabahlpark 2 hours ago
[dead]
reply
robin_reala 60 minutes ago
Complete side note, but I can’t work out how the author managed to mistype “at” as “Δt”.

Edit: reading fail on my part, nothing to see here.

reply
xmichael909 59 minutes ago
I caught that too, probably a qwen bug (;
reply
OneDeuxTriSeiGo 58 minutes ago
huh? the only Δt in the article is used correctly.

> With 18× 140 mm of surface, the fans run quietly and the coolant Δt across the rads stays small

reply
robin_reala 53 minutes ago
Hah, wow, I completely misread it. Delta-t makes sense when you get the context right, thanks.
reply
sandworm101 26 minutes ago
Ditch the tiny DC fans. Build a shroud and switch to a single ac-powered industrial blower / duct fan.
reply
sabareesh 4 days ago
Converting four RTX PRO 6000 Blackwell cards to waterblocks, finding a VRM choke loose on the workbench, and getting back to 41k tok/s.
reply
atemerev 57 minutes ago
If you want ready, well engineered, water-cooled multi-GPU research workstations, my colleagues at https://comino.com build and sell them. Or you can purchase fitted waterblocks from them for many GPUs, and build your own.
reply
warpfactor 34 minutes ago
AI slop post.
reply
stryakr 9 minutes ago
https://news.ycombinator.com/item?id=48557170

I picked up on it too, this wouldn't have been something difficult to share but it's far too verbose to be a real person's words in this way.

reply
stryakr 11 minutes ago
Why does this post sound like it's an AI story based on the inputs from the engineer?

The phrasing is very claude like:

"That cracked joint is the whole story. The card had passed initial bring-up and ran fine at light loads for a week."

"That sequencing matters — it’s why we have a story to tell. The pilot card failed, taught us a lesson, and the lesson is the reason the other three went on without incident."

"Driver swaps, CUDA reinstalls, and inference-engine theories were dead ends I spent hours on. The failure pattern itself told the story — listen to it earlier."

reply