Hacker News

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

106 points by mfiguiere 2 hours ago | 40 comments

ninjahawk1 14 minutes ago

The way to develop in this space seems to be to give away free stuff, get your name out there, then make everything proprietary. I hope they still continue releasing open weights. The day no one releases open weights is a sad day for humanity. Normal people won’t own their own compute if that ever happens.

visarga 4 minutes ago

I think it is in the interest of chip makers to make sure we all get local models

zozbot234 2 minutes ago

Definitely. Many big hardware firms are directly supporting HuggingFace for this very reason.

jjice 2 hours ago

With them comparing to Opus 4.5, I find it hard to take some of these in good faith. Opus 4.7 is new, so I don't expect that, but Opus 4.6 has been out for quite some time.

vidarh 7 minutes ago

When Sonnet 4.6 was released, I switchmed my default from Opus to Sonnet because it was about en par with Opus 4.5. While 4.6 and 4.7 are "better", the leap is too small for most tasks for me to need it, and so reducing cost is now a valid reason to stay at that level.

If even cheaper models start reaching that level (GLM 5.1 is also close enough that I'm using it at lot), that's a big deal, and a totally valid reason to compare against Opus 4.5

Someone1234 57 minutes ago

If money is no object, then nothing else is worth considering if it isn't Codex 5.4/Opus 4.7/SOTA. But for many to most people, value Vs. relative quality are huge levers.

Even many people on a Claude subscription aren't choosing or able to choose Opus 4.7 because of those cost/usage pressures. Often using Sonnet or an older opus, because of the value Vs. quality curve.

dd8601fn 21 minutes ago

Also us weirdos with local model uses. But your point stands.

seplite 15 minutes ago

Unfortunately, like with the release of Qwen3.6-Plus, this model also isn’t released for local use. From the linked article: “Qwen3.6-Max-Preview is the hosted proprietary model available via Alibaba Cloud Model Studio”

zozbot234 14 minutes ago

The Max series was never available for local use, though. So this is expected.

CamperBob2 21 minutes ago

Cost may or may not be a factor in my choice of model, but knowing the capabilities and knowing they will remain consistent, reliable, and available over time is always a dominant consideration. Lately, Anthropic in particular has not been great at that.

wahnfrieden 55 minutes ago

Codex subscription is very generous at pro tiers

oidar 47 minutes ago

Opus 4.6 performance has been so wildly inconsistent over the past couple of months, why waste the tokens?

hirako2000 54 minutes ago

You compare with what's most comparable.

In any case a benchmark provided by the provider is always biased, they will pick the frameworks where their model fares well. Omit the others.

Independent benchmarks are the go to.

alex_young 51 minutes ago

Quite some time is a little over 2 months. I understand this is actually true right now, but it’s still a bit hard to accept.

bluegatty 12 minutes ago

I think its only been like 10 weeks. I meant that's forever in AI time, but not a long time in normie people time.

atilimcetin 10 minutes ago

Nowadays, I'm working on a realtime path tracer where you need proper understanding of microfacet reflection models, PDFs, (multiple) importance sampling, ReSTIR, etc.. Saying that mine is a pretty unique specific use case.

And I've using Claude, Gemini, Qwen to double check my math, my code and to get practical information to make my path tracer more efficient. Claude and Gemini failed me a couple of times with wrong, misleading and unnecessary information but on the other hand Qwen always gave me proper, practical and correct information. I almost stopped using Claude and Gemini to not to waste my time anymore.

Claude code may shine developing web applications, backends and simple games. But it's definitely not for me. And this is the story of my specific use case.

zozbot234 5 minutes ago

What size of Qwen is that, though? The largest sizes are admittedly difficult to run locally (though this is an issue of current capability wrt. inference engines, not just raw hardware).

atilimcetin 4 minutes ago

I'm directly using https://chat.qwen.ai and planning to switch to Qwen Code with subscription.

trvz 60 minutes ago

The fun thing is, you can be aware of the entire range of Qwen models that are available for local running, but not at all about their cloud models.

I knew of all the 3.5’s and the one 3.6, but only now heard about the Plus.

Alifatisk 7 minutes ago

Their Plus series have existed since Qwen chat was available , as far as I remember. I can at least remember trying out their Plus model early last year.

0xbadcafebee 22 minutes ago

Everybody's out here chasing SOTA, meanwhile I'm getting all my coding done with MiniMax M2.5 in multiple parallel sessions for $10/month and never running into limits.

Aurornis 4 minutes ago

For serious work, the difference between spending $10/month and $100/month is not even worth considering for most professional developers. There are exceptions like students and people in very low income countries, but I’m always confused by developers with in careers where six figure salaries are normal who are going cheap on tools.

I find even the SOTA models to be far away from trustworthy for anything beyond throwaway tasks. Supervising a less-than-SOTA model to save $10 to $100 per month is not attractive to me in the least.

I have been experimenting with self hosted models for smaller throwaway tasks a lot. It’s fun, but I’m not going to waste my time with it for the real work.

Oras 51 minutes ago

I find it odd that none of OpenAI models was used in comparison, but used Z GLM 5.1. Is Z (GLM 5.1) really that good? It is crushing Opus 4.5 in these benchmarks, if that is true, I would have expected to read many articles on HN on how people flocked CC and Codex to use it.

coder68 16 minutes ago

In fact it is appreciated that Qwen is comparing to a peer. I myself and several eng I know are trying GLM. It's legit. Definitely not the same as Codex or Opus, but cheaper and "good enough". I basically ask GLM to solve a program, walk away 10-15 minutes, and the problem is solved.

Oras 9 minutes ago

cheaper is quite subjective, I just went to their pricing page [0] and cost saving compared to performance does not sell it well (again, personal opinion).

CC has a limited capacity for Opus, but fairly good for Sonnet. For Codex, never had issues about hitting my limits and I'm only a pro user.

https://z.ai/subscribe

ac29 44 minutes ago

GLM 5.1 is pretty good, probably the best non-US agentic coding model currently available. But both GLM 5.0 and 5.1 have had issues with availability and performance that makes them frustrating to use. Recently GLM 5.1 was also outputting garbage thinking traces for me, but that appears to be fixed now.

cmrdporcupine 28 minutes ago

Use them via DeepInfra instead of z.ai. No reliability issues.

https://deepinfra.com/zai-org/GLM-5.1

Looks like fp4 quantization now though? Last week was showing fp8. Hm..

wolttam 19 minutes ago

Deepinfra's implementation of it is not correct. Thinking is not preserved, and they're not responding to my submitted issue about it.

I also regularly experience Deepinfra slow to an absolute crawl - I've actually gotten more consistent performance from Z.ai.

I really liked Deepinfra but something doesn't seem right over there at the moment.

Alifatisk 15 minutes ago

GLM-5 is good, like really good. Especially if you take pricing into consideration. I paid 7$ for 3 months. And I get more usage than CC.

They have difficulty supplying their users with capacity, but in an email they pointed out that they are aware of it. During peak hours, I experience degraded performance. But I am on their lowest tier subscription, so I understand if my demand is not prioritized during those hours.

kardianos 45 minutes ago

Yes. GLM 5.1 is that good. I don't think it is as good as Claude was in January or February of this year, but it is similar to how Claude runs now, perhaps better because I feel like it's performance is more consistent.

pros 44 minutes ago

I'm using GLM 5.1 for the last two weeks as a cheaper alternative to Sonnet, and it's great - probably somewhere between Sonnet and Opus. It's pretty slow though.

c0n5pir4cy 43 minutes ago

I've been using it through OpenCode Go and it does seem decent in my limited experience. I haven't done anything which I could directly compare to Opus yet though.

I did give it one task which was more complex and I was quite impressed by. I had a local setup with Tiltdev, K3S and a pnpm monorepo which was failing to run the web application dev server; GLM correctly figured out that it was a container image build cache issue after inspecting the containers etc and corrected the Tiltfile and build setup.

cleaning 13 minutes ago

Most HN commenters seem to be a step behind the latest developments, and sometimes miss them entirely (Kimi K2.5 is one example). Not surprising as most people don't want to put in the effort to sift through the bullshit on Twitter to figure out the latest opinions. Many people here will still prefer the output of Opus 4.5/4.6/4.7, nowadays this mostly comes down to the aesthetic choices Anthropic has made.

Oras 5 minutes ago

Not just aesthetics though, from time to time I implement the same feature with CC and Codex just to compare results, and I yet to find Codex making better decisions or even the completeness of the feature.

For more complicated stuff, like queries or data comparison, Codex seems always behind for me.

throwaw12 44 minutes ago

maybe they decided OpenAI has different market, hence comparing only with companies who are focusing in dev tooling: Claude, GLM

edwinjm 31 minutes ago

Haven’t you heard about Codex?

throwaw12 22 minutes ago

its an SKU from OpenAI's perspective, broader goal and vision is (was) different. Look at the Claude and GLM, both were 95% committed to dev tooling: best coding models, coding harness, even their cowork is built on top of claude code

zozbot234 8 minutes ago

I'm not sure how this makes sense when Claude models aren't even coding specific: Haiku, Sonnet, Opus are the exact same models you'd use for chat or (with the recent Mythos) bleeding edge research.

__blockcipher__ 44 minutes ago

Yeah GLM’s great for coding, code review, and tool use. Not amazing at other domains.

esafak 46 minutes ago

I use it and think its intelligence compares favorably with OpenAI and Anthropic. The biggest downside is its speed.

JLO64 19 minutes ago

[dead]