Hacker News

87 points by kevinsimper 3 hours ago | 38 comments

tekacs 21 minutes ago

As they start to release more proprietary models, I so wish that they partnered with one of the major US hyperscalers to allow using these models through something US-domiciled.

Totally understand why it may not be reasonable or in their best interest (and that the US is _absolutely_ not doing the same reflexively). But it would be lovely to be able to try these out on production workloads in earnest.

embedding-shape 19 minutes ago

Unless US hyperscalers do the same in reverse, I hope the status quo stays as it is. Either people are happy to share, and the sharing should happen both ways, or US hyperscalers can keep isolating themselves as they've done so far.

adjejmxbdjdn 3 minutes ago

I do hope The U.S. hyperscalers do the same as well.

In an ideal world U.S. residents would use Chinese AI models and Chinese residents would use U.S. AI models.

Governments in both countries are collecting data for nefarious reasons. But the Chinese government has far less influence on a U.S. resident and vice versa.

We are all better off if our data is collected by a government halfway across the world instead of our own governments which hold incredible amounts of power over us.

tarruda 53 minutes ago

Looking forward to more open weight releases from Qwen, especially 122B and 397B.

smcleod 48 minutes ago

Yeah that 60-150b~ range is such a sweet spot for current 'prosumer' hardware, I'd love to see something like a 120b-a14b or there about.

tarruda 42 minutes ago

I have a 128G mac studio and even 397B was a happy surprise to me due to its high quantization resilience.

I've created a 2.54BPW quant that fit on my hardware with 128k context, 20 tps tg and 200tps pp, while maintaining high scores on many benchmarks: https://huggingface.co/tarruda/Qwen3.5-397B-A17B-GGUF/discus...

chrisweekly 8 minutes ago

Apple store's current options for mac studio seem to max out at 96GB. I'm questioning ROI, esp. given it's not upgradeable. Curious about others' takes on new mac hardware.

ttoinou 35 minutes ago

better than antirez ds4 ?

tarruda 28 minutes ago

I only tried a very early version of that when it was just a llama.cpp fork and Qwen was certainly better in my tests.

But I was not super impressed with deepseek 4 flash using it from the official API either, so it doesn't seem quantization fault. It is a good model, but nothing out of the ordinary in the few benchmarks I ran on it (with full awareness that benchmarks are biased).

gcr 44 minutes ago

What’s the price point for getting into that sweet spot?

I’m on an M1 Max with 32GB VRAM, so I’m looking forward to the 27B or 35B-A3B models. Is dropping $5k for an RTX 6000 or a DGX Spark really the best option?

tempoponet 24 minutes ago

Expect to pay $4k-10k

- Your RTX 6000 is closer to $10k now

- Sparks are creeping into the $4-5k range

- AMD Strix are ~3.5k

- Apple depends on chipset and memory. Sweet spot would be 128gb M3 Ultra, probably $6-8k but admittedly haven't been tracking closely. New M5 might come in the fall. You can get a new 128gb M5 Max laptop for ~5-6k today.

- a 4x3090 rig would take $5-6k

Every platform has tradeoffs, but it's mostly ecosystem, memory bandwidth, and power consumption. They're all slow. The best option is likely to rent hardware on Runpod. The RIO on self-hosting is very low unless you have a specific need or you're ok treating it as a hobby.

anonym29 18 minutes ago

Bosgame M5 (Strix Halo) w/ 128 GB still goes for $2800 right now. SH systems have surged in price dramatically but quite unevenly.

>The best option is likely to rent hardware on Runpod.

Vast.ai is much cheaper, but the broader point here is contestable. The only dimension in which cloud GPU rentals win is cost. You lose the confidentiality, integrity, and availability benefits of local deployments.

embedding-shape 22 minutes ago

If I could find a RTX Pro 6000 for $5K I'd definitively grab it, I'm running RedHatAI/Qwen3.6-35B-A3B-NVFP4 on one (I had to pay closer to $10K for it though) with 260K context and it's a blast! ds4 by antirez also works well, even IQ2XXS seems to work relatively well but Qwen3.6-35B-A3B-NVFP4 is both faster and higher quality responses (at least for coding and translations which I use them mostly for).

ttoinou 34 minutes ago

M5 Max 64GB (sweet spot) or 128GB (only 1000 USD, better to keep it for the future) more are the best quality price ratio, future proof, reliable, resellable and flexible workloads. Harder to use as a server might be the only drawback

throwaw12 28 minutes ago

What do you recommend for non-Mac setup? I am a Mac user, but its getting expensive, and not seeing reason to jump to the latest M5

roger_ 29 minutes ago

M5 Max 128GB for $1k?

tempoponet 18 minutes ago

The memory upgrade is $1k on a Macbook Pro. The laptop is ~$5500.

smallerize 21 minutes ago

I think they mean the upgrade to 128GB is +$1k.

tarruda 37 minutes ago

> What’s the price point for getting into that sweet spot?

In October/2024 I got my Mac studio M1 ultra with 128G, IIRC it was ~$2500. With recent prices explosion, it has certainly gotten more expensive. https://frame.work/ is selling 128G strix halo mainboard for $2700, but you have to add storage and case.

anonym29 33 minutes ago

Strix Halo at $2k with similar TG and about half the PP of DGX Spark was a pretty good deal IMO, especially considering it's also a full x86 system... 16c/32t Zen 5, 40 CU RDNA 3.5, 128 GB unified memory at ~220 GB/s real-world speeds (256 GB/s theoretical) - that runs full tilt at 140W in performance mode and idles at ~10W.

Unfortunately, the prices rose on these a lot, but unevenly. Beelink GTR 9 Pro is $4400, Framework Desktop is ~$3500, for what is basically the exact same mainboard as a Bosgame M5 for $2800.

Apple's M5 Max is another attractive option. Apple silicon traditionally had great MBW and was good at TG, but struggled with PP, but the new neural engines in those GPU cores have made a big difference in a good way here.

Gorgon Halo is rumored for June announcement with Q4'26 release with basically +100 MHz clocks on Strix Halo, LPDDR5X-8533 instead of LPDDR5X-8000, but more importantly, 192 GB max instead of 128 GB.

I'd say it's better to wait for Gorgon Halo than to grab Strix Halo now. However, Medusa Halo, rumored for H2'27, is slated to have up to 26c Zen 6 (heterogeneous cores - kinds funny that AMD is heading towards these as Intel retreats from them), 48 CU of RDNA 5 instead of 40 CU RDNA 3.5, and a 384 bit bus w/ LPDDR6, which should make 256 GB at more like ~490-600 GB/s MBW, which will really make Strix and Gorgon Halo obsolete.

Also worth keeping an eye out for Serpent Lake (intel CPU + nvidia iGPU on a single board with unified memory, rumored for 2028-2029 iirc), and on the 160 GB Crescent Island Intel dGPU.

guitcastro 16 minutes ago

I am still waiting for qwem image-edit 2.0 open weight

mixtureoftakes 35 minutes ago

I'm more excited for qwen3.7 9b and 72b, these are usually so good for their size

goyozi 3 hours ago

These are very good numbers. I still don’t get why they don’t compare against latest competitor versions in these posts, it’s not like we’re all not going to notice.

NiloCK 4 minutes ago

I find it forgivable if it's within minor version bump. (NB that x.5 is now a defacto major-version bump for LLMs for whatever reason).

Even with LLMs, posts like this don't just fall out of a coconut tree. If you have a set of target benchmarks for your own model, then keeping "the set" of side-by-side comparable models is its own maintenance headache.

Aurornis 20 minutes ago

I think the argument is that trying to suggest that they’re close to N months from SOTA.

Realistically I assume they hope readers don’t notice the fine details.

The Qwen models are great for open weights but for every past release they haven’t performed as well as the benchmarks in my experience. They’re optimizing for benchmark numbers because they know it works.

beydogan 6 minutes ago

honestly, initial version of Opus-4.6 was much better than whatever we are being served right now as 4.7. If it performs same level to that, i'm totally willing to switch.

htrp 46 minutes ago

I think its part of the expectation setting (with a side of we did our distillation/ eval harness on a specific model).

if they say it's 4.7 comparable, it anchors that into your head as the model to evaluate against.

hmokiguess 2 hours ago

this puzzles me too, I want to know

maelito 41 minutes ago

Marketing.

pulse-dev 3 minutes ago

[dead]

bratao 2 hours ago

It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.

vessenes 54 minutes ago

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.

bsenftner 48 minutes ago

Any reports from people using their coding agent(s)?

XCSme 18 minutes ago

Any info on pricing and latency?

hydra-f 2 minutes ago

[dead]

nikhilpareek13 15 minutes ago

[dead]

kevinsimper 3 hours ago

[flagged]

howmayiannoyyou 19 minutes ago

I can't bring myself to use any model that trains or sends telemetry back to my country's primary competitor/adversary. I don't care how much money is saved.

Mashimo 12 minutes ago

That is understandable. Just don't do it. No need to announce it.

InsideOutSanta 4 minutes ago

As somebody in Europe, uh, that doesn't leave many options.

dfansteel 19 minutes ago

Can anyone check its knowledge base for me? I’m honestly not able to run it and the Qwen models I can run censor information critical towards the Chinese government.

Tiananmen Square is the first place to start.

Mashimo 10 minutes ago

> I’m honestly not able to run it

What do you mean? This is not self hosted, it's closed source. And any website that targets China or is hosted in China will probably censor Tiananmen Square.