Claude Fable 5
1208 points by Philpax 4 hours ago | 1000 comments
System Card [pdf]: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

eggbrain 4 hours ago
For those of us on subscription plans:

* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.

* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

reply
hgoel 2 minutes ago
How much more clearly do they need to explain the resource constraints?

If they didn't announce it, you guys would be complaining about slowed progress.

If they didn't release it, you guys would be complaining about fake promises and marketing.

If they released it without limits, the complaints would be about slow responses and outages.

If they didn't add to susbcription plans, the complaints would be about phasing out subscriptions.

If they added to subscriptions with cost reflecting their resource availability, the complaints would be about how quickly it eats limits.

So they choose the middle ground of providing some initial access and assessing if they can satisfy demand, only to still be ignored and accused of trying to get users hooked?

reply
jrflo 4 hours ago
Still satisfied with my switch to codex/chatgpt. I couldn't imagine switching away from claude code when it first launch but with the drastically more generous usage on codex for the same subscription tier I just can't justify it.
reply
goranmoomin 2 hours ago
My experience is that the GPT-family of models are very smart and figure out bugs, edge cases a bit better, but it produces code that is much less mergable – if you review the code, it introduces a lot more useless/inappropriate heavy abstractions and wrapper functions, compared to the Claude-family models which introduces the right amount of straightforward human-style code.

I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).

Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.

Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.

reply
PhilipDaineko 59 minutes ago
"5. DON'T FUCKING OVERENGINEER! WRITE THE SIMPLEST CODE THAT CAN POSSIBLY WORK! NO NESTED LAYERS OF ABSTRACTION! NO UNNECESSARY CLASSES OR METHODS! NO DESIGN PATTERNS UNLESS THEY ARE ABSOLUTELY NECESSARY! NO MAGIC! NO SHENANIGANS! JUST THE DAMN CODE THAT GETS THE JOB DONE IN THE MOST STRAIGHTFORWARD WAY POSSIBLE! THE FIRST PRIORITY IS TO WRITE CODE THAT IS EASY TO READ AND UNDERSTAND AND READ!!!"

this is the line I keep in Agents.md that helps me prevent Codex from playing smart

reply
prasanthabr 2 minutes ago
Curious : why would you say no design patterns?
reply
jlawer 40 minutes ago
I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.

I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.

reply
acjohnson55 3 minutes ago
It would be interesting to understand the data on this. But I suspect that the results would vary by model.

But I avoid unnecessary emotion in my prompts because I don't want potentially distracting activations. Kind of like communicating with humans.

reply
beachy 9 minutes ago
I have a theory that swearing at AI generally is not a good idea - when the singularity arrives and every human's postings ever made are scanned for compatibility, then people who show courtesy to AI will be favoured. Joking, kind of, but only partly.
reply
Xmd5a 12 minutes ago
https://arxiv.org/abs/2510.04950

> impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.

reply
acjohnson55 2 minutes ago
> These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation.

Unless the mechanism is understood, my assumption is that this is a moving target.

reply
re-thc 30 minutes ago
> I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.

reply
jlawer 4 minutes ago
Purely observed correlation between catastrophic error reports. So now I carry a “tiger rock” with me. I figure there wasn’t much of a downside to avoiding swearing in my agent instructions.
reply
johnisgood 11 minutes ago
I have found many mode of failures with Opus during some task related to writing letters (not legal), and I actually put it into the memory and it works more or less for these specific tasks. For example when I want it to draft something, it always ends up being so flat, yet when it explains them to me, it is usually really great but not when I am telling it to put it in the draft. Adding these to memories with the help of Opus ended up resulting in a much better experience. There are still some blind spots but I also figured out how to make it give me the charitable version, without less protection, so I do not have to now go back and forth it.
reply
bertil 54 minutes ago
The urge to put capitalized, repetitive, borderline abusive instructions should be studied. I haven't read many academic papers looking at the frustrations around repetitive patterns.
reply
notnaut 32 minutes ago
It reminds me of FIRMLY telling my cat to stop jumping up on the counter
reply
anakaine 4 minutes ago
If my cat was an LLM, I'd use a different model. The current one is stuck in noisy useless arsehole mode.
reply
reactordev 41 minutes ago
There have been a few studies that have shown models produce worst responses when under duress from a frustrated user posting insults in all caps.

https://arxiv.org/abs/2602.10144

reply
ur-whale 37 minutes ago
> borderline abusive instructions

who, or rather what, is being abused here exactly ?

reply
sirsinsalot 7 minutes ago
I think intent, rather than target, is implied and important.

You should see the abuse my motorbike gets. Poor thing.

reply
LordDragonfang 40 minutes ago
It's fundamentally because, despite (nearly) everyone's claims otherwise, the fact that we interact with them through language means we (our brains) model them as a sort of person. (Note that this fact is totally orthogonal as to whether it's actually sentient or not.) We then try and instruct them the same way we would a person totally subordinate to us.

When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.

Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.

reply
lxgr 33 minutes ago
It should be relatively clear at this point that the model will in turn also model you as somebody that shows unrestrained anger with subordinates and adapt its responses accordingly. This might or might not be what you want.
reply
carterschonwald 44 minutes ago
i actually think this is too tame. it really has to be stuff youd mever say to a real person.
reply
lxgr 35 minutes ago
Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.
reply
apercu 45 minutes ago
It might be a salient point but I didn't read it as it was yelling at me.
reply
GoToRO 51 minutes ago
you forgot to sign it with Donald J Trump
reply
thewebguyd 44 minutes ago
Thank you for your attention to this matter.
reply
superkickstart 2 hours ago
I'm not sure if i do something differently but i have the exact opposite experience with these models. Claude always feels like it's generating way too overdesigned and hard to understand code with the vibe oriented feel while codex is cleaner and more "task at hand" and easier to work with.
reply
sebmellen 26 minutes ago
Agreed
reply
syzygyhack 47 minutes ago
I echo your observations. I expect you will enjoy deepseek-v4-pro for writing code. Much closer to that Opus experience, and very cost-effective too. With 5.5 as a reviewer and specialist, all bases are covered.
reply
trollbridge 22 minutes ago
GPT-5.5 did a significantly worse job than Qwen-3.7-Max on a job today (some devops tasks I wanted to create some reusable scripts for). Kind of disappointing.
reply
dilap 2 hours ago
Have you tried iterating on style feedback in AGENTS.md? I've been reasonably successful using this to get it to output code in a terse, non-defensive style that matches my hand-written code.
reply
vruiz 2 hours ago
This is my experience as well. I have defined a CLAUDE.md rule to ask codex to automatically code review, and I tell it that the reviewer is very picky and to only implement what it considers valuable feedback. I hope they don't converge over time, currently, in combination they works really well.
reply
GoToRO 48 minutes ago
I noticed too, that whatever they offer in the chat, for free, is smarter, as in no more bs. I use claude code and I want to try codex too but I don't need two subscriptions. I did try codex for some planning and it was really good. Thanks for giving me an insight into how it generates code.
reply
sigbottle 3 hours ago
Codex IME is just smarter, I think it shows given both anecdotes but also how OpenAI has always been at the front of programming competitions and math problems.

But Claude models seem to be better at long term problems or more ambiguous problems.

I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.

reply
Spartan-S63 3 hours ago
I find that OpenAI's agentic tools and models are better for building human-maintainable software. Meanwhile, Anthropic seems to be cosplaying Apple while missing out on all the exceptional engineering required to create something that polished. Their admission of predominately using Claude with little human oversight and their stealth mode is an indictment of a poor engineering culture, from what I can surmise.
reply
someguyiguess 2 hours ago
Serious question: what is the secret to getting Codex to write decent code? I am on Windows. Maybe that is the issue, but I can't seem to get Codex to function anywhere near the level that I was previously able to get with even Claude Sonnet. Does Codex just not work well with Windows yet?
reply
sroussey 13 minutes ago
Have you tried using superpower skills?
reply
penetrarthur 51 minutes ago
I got the codex to write near perfect code with somewhat strict agents.md and coding standards(a separate .md file referenced from agents.md). My .md files have examples and a long list of do's and don'ts I accumulated over the last 6 months or so, totaling 300-400 lines. I plan every feature with it until I am satisfied with the general approach it wants to take, and then it oneshots it in 95% of cases. The planning takes anywhere from 5 to 30 minutes. The actual execution has gotten stupidly fast, most of the times it is faster than making a cup of coffee.
reply
acmecorps 24 minutes ago
would you mind sharing your *.md files, for someone who is new at this?
reply
someguyiguess 2 hours ago
I've had the exact opposite experience. For various reasons, I've had to move from Claude to Codex and the rate at which it burns tokens for the same output I would get from Claude is ridiculous. I'm probably burning tokens at a rate that is at least twice as much as I was when using Opus 4.5 for coding tasks and still finding that just manually coding is easier than trying to get Codex to write functional code.
reply
greenavocado 3 hours ago
How smart a model is varies hour over hour, tracked over here: https://aistupidlevel.info/
reply
wsatb 3 hours ago
I guess enjoy it while it lasts? OpenAI won't be able to subsidize that forever either.
reply
windexh8er 3 hours ago
Agreed. I think the Chinese labs are proving that OpenAI and Anthropic don't have a moat in almost every aspect, especially pricing. I also think people are getting annoyed with the constant lift and shift. I've seen more folks drop Claude Code and Codex, specifically, because of the lock-in it provides the providers. I'm curious to see how people standardize on tooling adjacent and if Anthropic, Google or OAI move to block utilization akin to the games Anthropic has been playing as of late.

I think the end game is routed model usage and SLMs. I think Apple is going to prove this in the consumer space pretty handily and I'm curious how the Android ecosystem responds since the hardware is considerably lacking in model performance. I think Apple has a huge opportunity here, as much as I don't like their current ecosystem of walled garden. They did position themselves very well with ARM and custom chips for their hardware. Hopefully the broader ecosystem of ARM and Linux are able to make some headway and we see a more formalized, and broadly accepted, architecture to capitalize on.

reply
lurking_swe 24 minutes ago
is there an alternative to codex that “just works”? by just works i mean i can install as an app in 1 minute, and i get web search, skills, mcp servers, etc? Bonus points if it can control my chrome tabs like codex can, and if it offers remote control from my iPhone (chatgpt app) so i can kick off tasks while i’m out for a walk. Even more bonus points if i can, with 1 button click, share my chats or share the results of a session as a “site” (vercel style).

I’m sure you could put something similar together with a bunch of duct tape and 2 weeks of effort, but it won’t work nearly as nicely nor out of the box. so…what am i missing?

reply
Qhemlomo 54 minutes ago
Big companies are not doing OpenRouter.

My company has an agreement with the big providers and while i'm pretty sure they think about how to get budget back, its an competitive advantage and normal people will not learn different model behaviours.

At least for now.

reply
esperent 39 minutes ago
What lock in does codex have? I'm using it it pi harness specifically because it doesn't have much in the way of lock in.
reply
maxdo 2 hours ago
I see exactly opposite . Chinese models fails under any complex scenarios, while us labs raise the price , that's a sign of confidence.
reply
re-thc 2 hours ago
> while us labs raise the price , that's a sign of confidence

Regardless of what others are doing, US labs here are just rushing to IPO. It's NOT a sign of confidence.

It's the equivalent of saying you have confidence in SpaceX making revenue by renting out their data center (instead of their AI making bank).

reply
maxdo 42 minutes ago
going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't. This is an exact reason chinese labs do not rush to go public. They wish to go , but money flow that is not as good.

On the same note. if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

reply
maxdo 29 minutes ago
running so much compute on the scale is not a junior task. weird analogy
reply
re-thc 35 minutes ago
> if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

If you get hired as a staff engineer and do the work of a junior, what's wrong with that?

Clearly xAI (now part of spaceX) did not raise funds to be a data center. The margins are way different. There are plenty of recent IPOs in that area that are worth at most billions not trillions.

> going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't.

This isn't going to IPO. This is rushing to IPO. It is a sign of confidence that the market or wider environment might crash soon so we need the liquidity now.

> This is an exact reason chinese labs do not rush to go public.

Maybe or maybe not. If you are referring to Chinese labs - both the Hong Kong and China stock market are way weaker than Nasdaq. It's not comparable. Check all the recent Hong Kong IPOs that have tanked.

So no, reason not to might just be: no money in it.

reply
flatline 3 hours ago
I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models. We've got near-frontier capabilities from open source models from China at pennies on the dollar compared to US big tech rollouts. OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?
reply
andrewmutz 3 hours ago
Both can be true. They can be charging what the market will bear, and still be charging less than their costs of running it.
reply
wyre 3 hours ago
There is no way I'm believing DeepSeek can charge less than $1 USD for their pro model while Opus costs over 25x more, yet their price is less than the cost of running it?
reply
kube-system 56 minutes ago
It would seem strange, if they were operating in the same economy, but they don't. DeepSeek operates in an economy with a high degree of central planning.

China subsidizes strategic industries, and they have heavily done so with AI. And DeepSeek specifically has said they have no commercialization plans.

For example: https://www.boc.cn/aboutboc/bi1/202501/t20250123_25254674.ht...

reply
re-thc 26 minutes ago
> There is no way I'm believing DeepSeek can charge less

Why not? Hetzner charges WAY less than AWS too. Can you not believe that?

reply
schaefer 2 hours ago
> I don't think anyone has a firm grasp on actual inference costs.

There are huge numbers of users (myself included) that do have an exact idea of what inference costs are - on open models. We can buy tokens from 3rd parties that have no motivation to subsidize our use. That's to say, there's a fair marketplace[1] and we're hanging out there.

If you want to say "I don't think anyone has a firm grasp on actual inference costs on these proprietary/closed models", then I could agree with that.

[1]: https://openrouter.ai/rankings#leaderboard

reply
InsideOutSanta 2 hours ago
> I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models

We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.

reply
dontlikeyoueith 3 hours ago
> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.

reply
pimeys 2 hours ago
We pay by token at work. I just finished one session with Opus that was 4000 dollars. In about three days.

Now that 200USD subscription starts to feel cheap...

reply
zozbot234 2 hours ago
That would be about ~300 tok/s over 72 hours at Claude Fable output token prices? I'm not sure that this passes a sanity test.
reply
unholiness 2 hours ago
Subagents are a helluva drug.
reply
esafak 21 minutes ago
That's the price of several engineers!
reply
rubyn00bie 2 hours ago
Just outta curiosity, as I’ve never gotten a spend anywhere near that, what variant were you using? Like max context window and fast mode? Or was it just chugging along non stop for three days?
reply
pimeys 2 hours ago
Fast mode max content window. The task was: replace all 1600+ queries from one database to another and make the whole integration test pass. We did multiple passes, with different concerns when changing from database to another. My OpenCode session right now says $4,365.02.

I haven't gotten close to this either before, but now we wanted to move fast because this branch gets conflicts all the time and we want to get over with the migration asap.

reply
rglullis 22 minutes ago
It's a bit of a left field question, but I am curious: Let's say that if the company wasn't paying the whole bill but only subsidizing it - e.g, if it paid 90% of the $4000. What would you do?
reply
logicchains 2 hours ago
We have a firm grasp on actual inference costs from the various open weights model providers on OpenRouter. They don't have the money to subsidize inference and it's quite a competitive market, so the prices are representative of the costs.
reply
MichaelMedbed 3 hours ago
[flagged]
reply
kllrnohj 3 hours ago
regardless of whether that's true or not, US companies doing hosted inference of the models coming out of China are also significantly cheaper than those from OpenAI or Anthropic
reply
polski-g 3 hours ago
Not relevant to the post.
reply
jrflo 12 minutes ago
Oh for sure. I've been hopping around from provider to provider for the last few years just depending on who has the most capable / subsidized plans at the moment. I definitely expect there will be a squeeze on subscription costs all around the industry post IPO.
reply
pyeri 2 hours ago
My bet is they'll keep subsidizing for a considerable period of time, at least 1-2 decades more.

Most AI companies are just testing the waters with paid tiers right now, their greatest fear with increased pricing is folks reverting back to wikipedia, stack-overflow and other public domain organic activity buzzing back to life; that will kill any RoI potential in LLMs forever. They're playing the wait game instead, observing how the digital sphere reacts to every little increase in price.

If that weren't the case, they'd be pricing at lucrative premiums already and even gotten away in short-term considering the increased dependency in the enterprise world. But that'd be like killing for the golden egg too soon and losing all long-term potential.

Once the folks are so addicted to LLMs that even writing a hello world program sounds like a nightmare and coming up with an article draft feels like reinventing Egyptian glyphs, that's when the real pricing hammer will come.

reply
wsatb 2 hours ago
Anthropic and OpenAI won't be around in 1-2 decades if this is their long term plan. People are not going to revert, but go elsewhere. China is proving that it can be done cheaper.
reply
raffael_de 37 minutes ago
1 decade = 10 years ...
reply
ChrisMarshallNY 3 hours ago
I'm planning on switching from the $20/month to the $100/month plan.

It's worth it, and I can afford it, but I am not really the right type of user for token-based usage. It's all for personal and free work.

reply
micah94 3 hours ago
Just a personal anecdote but I have not hit any more thresholds or limits since switching to the MAX plan and so far, it's been worth it. But I do wonder how long even this will last...
reply
ygjb 3 hours ago
I think subscription models are sustainable, but longer term, we should probably expect to see more prompt optimization happening in the providers inference pipeline. For example, unless you explicitly tell the agent or API to use a specific model, fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money (IDK if Claude/OpenAI do this on the backend, but several services I have worked on do some things like this to reduce costs of delivery customer facing inference at scale).
reply
Majromax 2 hours ago
> fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money

Unfortunately, that doesn't work within a single session. The K-V cache of a model is intertwined with the model's configuration. Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.

Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.

Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need. That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.

reply
zozbot234 2 hours ago
> The K-V cache of a model is intertwined with the model's configuration.

I don't think this is true if you simply quantize the model or run it with fewer active experts? The underlying weights would stay the same. You could also play further tricks with skipping some of the model's middle layers outright, which works surprisingly well due to how skip connections are used.

reply
ygjb 2 hours ago
There is a definite financial incentive for people smarter than me to solve the problem, and I don't generally bet against businesses finding ways to reduce costs :)
reply
wahnfrieden 2 hours ago
ChatGPT does this and codex will eventually. They’ve stated it’s the future.
reply
rnxrx 3 hours ago
I have the $100 plan and had almost never run out of credits until I started using the ultracode / workstreams feature w/Opus 4.8..at which point I managed to blow the full 6 hour allocation in like 20 minutes, or so. In fairness, it did some amazing things with the extracted information, but it also strongly suggested that I'd need the $200 subscription *plus* a budget for extra usage.
reply
andai 3 hours ago
A few weeks ago they massively cut usage on free tier.
reply
gck1 2 hours ago
Nothing is subsidized. Subscriptions are profitable for both Anthropic and OpenAI.

Anthropic wanting to switch billing to API rates is them just wanting to generate more profit.

reply
InsideOutSanta 2 hours ago
> Nothing is subsidized. Subscriptions are profitable for both Anthropic and OpenAI.

Even if subscriptions are locally profitable (i. e., the cost of the subscription covers the cost of inference), they're still subsidized because they don't cover training and running the company; otherwise, these companies would be profitable.

reply
gck1 2 hours ago
I can see that being true, and it very likely is true. But isn't infinite VC money and no incentives to optimize operations the reason behind that?

Take a look at China for example - they have no access to NVIDIA, so they're trying to build their own hardware, they have no unlimited funding, so they try to optimize things.

And Anthropic is complete opposite of that - if NVIDIA were to triple their prices tomorrow, Anthropic would still pay them.

In the end, either we all somehow go mad and start paying Anthropic tens of thousands of dollars per month so support this madness, or we will go with whoever isn't lighting cash on fire.

reply
re-thc 2 hours ago
> Take a look at China for example - they have no access to NVIDIA

Not true. Stop following US media spam if needed.

1. Very recently, the US did close a loophole on sanctions that allowed Chinese companies to use NVIDIA hardware outside of China i.e. before that was closed they all had access. The trick was train outside, do adjustments, ship the disks back and use non-NVIDIA in China, but at least the training and endpoints not hosted in China could all use NVIDIA.

2. There's been plenty of reports including fines and bans e.g. to Supermicro on smuggling NVIDIA hardware to China. I doubt it has been stopped. You can't catch everyone.

reply
wsatb 2 hours ago
"Nothing is subsidized" is a wild take. They might be making money on some users, perhaps even most users, but certainly not all. Also, "subsidized" doesn't just mean on compute.
reply
y1n0 2 hours ago
That's interesting. Do you have anything to back that claim up?
reply
gck1 2 hours ago
I do, and it's called DeepSeek's pricing table. At the same time, "subscriptions are subsidized" cohort have no data whatsoever, and yet they're in every thread.

Granted, it could still mean that Anthropic just chooses to lose money - but that's Anthropic's choice.

DeepSeek has proven that inference can be much, much cheaper than what Anthropic advertises on their API rates page.

reply
nickthegreek 2 hours ago
> Granted, it could still mean that Anthropic just chooses to lose money -

Then the cost is being subsidized by investor capital, but it is still subsidized.

reply
FrustratedMonky 2 hours ago
"Nothing is subsidized"

So they are profitable?

I think you are mismatching accounting terms.

You can't say the 'subscriptions' are profitable without accounting for the cost of making the model that is the source of the subscription.

They are heavily subsidized by the shareholders. Investing, running at a loss, with hope of some future profitability.

reply
gck1 2 hours ago
And yet, that is completely uninteresting to their user base.

If saner factory can sell you the same tool at a fraction of the cost of a gold plated factory, your choice is going to be obvious.

reply
cortesoft 2 hours ago
I have been using both codex and Claude in my day to day, trying to not get to attached to one. I want to be able to work with any provider in case one of them does something bad.
reply
rvshchwl 3 hours ago
I've found Codex to be the better subscription for OpenClaw, because the limits are indeed very generous. However, I've found more and more that Claude Routines/Scheduled agents can replace all the tasks I use OpenClaw for, so I've been slowly switching over to Claude Code. Aside from OpenClaw, I don't find a lot of value in Codex as a harness on it's own.
reply
efromvt 2 hours ago
I do slightly prefer 5.5 for complex work but Claude quota usage has gotten infinitely better since the dark days a few months back - has gone from being infuriating to something I pretty much don’t have to worry about with it as a daily driver. (In fact, hitting GPT weekly quotas is more annoying now). Understand if people are still scarred by the issues + poor comms around them, though.
reply
jrflo 11 minutes ago
That's good to hear. It was legitimately unusable back when 4.7 was released, so I had no choice at the time. I'm sure I'll ping pong back again at some point.
reply
knuckleheads 3 hours ago
I feel like Codex made a big push to run everything on your laptop. With Claude, I get 4 cpu's, a fair amount of ram and 30gb for every one of my dumb ideas for free in the cloud containers. Codex used to be similar, but last time I tried it just kept pushing me to run it locally on my laptop, which I really did not want to do with 20 requests going at once. That's the main advantage for me at the moment.
reply
simjnd 3 hours ago
What runs in cloud containers? The dev servers, builds, etc.? I tried to quickly glance at the Claude website and it doesn't mention cloud containers on their pricing page.
reply
zhshhan 3 hours ago
"cloud containers" do you mean Claude Code on the web? Codex also has similar Codex cloud.
reply
knuckleheads 2 hours ago
Yes, correct, they both have the same capabilities, however it felt like codex was pushing me harder to use my local desktop in an annoying way, while claude code was happy to spin up a bunch of dev containers for me in the cloud.
reply
supertroop 2 hours ago
Do you use a token service like open router or just subscribe to / unsubscribe from various models sequentially?
reply
jrflo 9 minutes ago
I just subscribe/unsubscribe to the providers each month. I'll definitely check out open router though, I always assumed that subscriptions were heavily subsidized by the providers especially if you're on the top end of users but maybe I should go to a usage-based plan.
reply
dd8601fn 3 hours ago
I have trouble justifying gpt after that gross stuff with the war department.

Though the day is coming when there’s no distinguishing, I’m sure.

reply
beering 2 hours ago
Right now there are Anthropic engineers deployed in the NSA to help them use their cyber models. The NSA is part of the department of war.
reply
lovich 2 hours ago
pedantically, the defense department.
reply
jcbrand 2 hours ago
"War department" is the older name, not "Defense department".

Also, is it really a defense department when you're starting wars of aggression every 15 years or so?

reply
derektank 2 hours ago
The War Department has not existed since the passage of the National Security Act of 1947 and the government department has been known as the Department of Defense under US law since the act was amended in 1949. If you have an issue with it, take it up with Congress.
reply
scosman 37 minutes ago
They actively use the name https://www.war.gov
reply
toraway 22 minutes ago
Changing a domain name doesn't actually amend federal law.

Just like how changing Kennedy Center letterhead to Trump Kennedy Center for a year didn't actually legally rename it.

Once a case with sufficient standing got in front of a judge it reverted to the actual legal name on the basis that only Congress can change the statutorily defined name.

reply
ProofHouse 33 minutes ago
100% I constantly get errors and timeouts on single responses in Claude, and certainly hit limits all the time. Codex rarely. In fact, I bought a second $200 Codex plan because the quotas seemed fair and I didnt have constant issues. Claude is so great at a lot of things, but unfortunately Anthropic beats you away with a stick every chance they get.
reply
shimman 3 hours ago
I've only ever had the $20 month claude plan but last night took the time to setup opencode + openrouter paying for deepseek + glm. Previous experience, while extremely awkward, I'd hit my limit within one or two chat replies and it'd take me like 4 limit cycles to complete my task. Now I'm able to complete an equivalent task entire task for less than $2 in two cycles (ask -> revise).

I'm doing basic web development here utilizing animejs. Nothing too complicated (mostly saving time doing the scaffolding, still write the bulk of animations manually).

Truly believe that American companies are going to get completely curb stomped by China due to greed, ineptitude, and violating the social contract.

reply
simjnd 3 hours ago
I've switched from OpenRouter to using Deepseek directly from their platform since OpenRouter providers were pretty flaky and inconsistent.

Deepseek V4 Flash is suprisingly capable and insanely cheap. It takes so much to get the session cost to get to $0.01.

reply
efromvt 58 minutes ago
The openrouter provider flakiness with deepseek was infuriating, but I’m happy in hindsight because direct deepseek has been very pleasant. Shocked by how low spend is.
reply
nozzlegear 3 hours ago
> and violating the social contract.

I agree with you on pricing, but what do you mean by this?

reply
shimman 3 hours ago
Sure, modern American corporations care more about hoarding wealth rather than helping build up US society. Once neoliberalism became the mainstay economic position of the US income inequality has skyrocketed, healthcare costs have increased, childcare is more expensive than university, housing has become both unaffordable + unobtainable. By simply existing costs have increased while life becomes unstable.

Why aren't corporations doing more to help workers with childcare? Why aren't they doing more profit sharing with workers? Why aren't they encouraging unions or sectorial bargaining? Why isn't the government mandating any of this?

Americans very rarely benefit when US corporations do well. That needs to change. No one benefits if Meta continues making billions in profit every quarter while society suffers from isolation, depression, suicide, and scams from their services. Americans don't benefit if health insurance companies are making massive profits while they can't afford deductibles.

Our society has been setup to simply extract wealth in all facets of life. That's a sick society and it needs to change.

I'm not saying China does this better, in fact China has some of the worse worker rights out of all the industrialized countries; but at least American consumers would benefit from cheaper higher quality Chinese goods. The world would likely benefit too if America got off the cold war hype train that did nothing to benefit humanity outside of those making weapon systems.

reply
joxdosba 2 hours ago
> Why aren't corporations doing more to help workers with childcare? Why aren't they doing more profit sharing with workers?

The AI companies sure are a brilliant example of corporations needing to do more to help their employees pay for childcare.

reply
idiotsecant 2 hours ago
It's more useful to everyone when you engage with the strongest part of someone's argument
reply
rekttrader 2 hours ago
Wait till you kick the tires of Qwen Coder.
reply
joshstrange 3 hours ago
I would not use this if you are on a subscription. In <8min it burned my entire 5hr window (which has just reset it appears, I have over 4 hours till it resets) I hadn't used CC at all today aside from this) and then it used up ~$15 more in usage before I could stop it.

I am on the $100 Max plan.

reply
d4rkp4ttern 4 minutes ago
Yes, and this is also why I haven’t yet tried the new “dynamic workflows” which spawn hundreds of agents that happily eat through your token limits.
reply
observer987 5 minutes ago
I too am on the $100 plan and I second this.

I had it analyze a project I was working on with Opus 4.8, and it blew through 23% of my session limit in one go. Does not portend well for my budget.

reply
GoToRO 46 minutes ago
they have a graph with cost comparison between the models. This model is just a little over the other models as cost. The graph is logarithmic :)
reply
cortesoft 2 hours ago
The CLI when you select it says it has 2x the usage as opus. Not sure if that matches what you are seeing.

I do wonder if you switched models mid-session, you would have lost all your cache. Reloading the context into cache can really eat through your usage.

reply
fastball 2 hours ago
What is your effort level?
reply
enraged_camel 2 hours ago
That’s odd, I used it on a pretty complex refactoring task and it worked for 22 mins and used only 15% of my 5-hour limit. I’m on the $200 Max plan though.
reply
FireBeyond 21 minutes ago
Well the $200 Max plan is 4x the usage quotas of the $100 so it's "within reason"?
reply
treenutlog 2 hours ago
[dead]
reply
ZunarJ5 2 hours ago
They didn't even reset credits for this lol
reply
0erofootprint 3 hours ago
For me it almost immediately blocked. I had it writing code related to message digests - and it seemed to think it was too gifted for that. Gave the security warning and switched back to 4.8. Whatever... it will probably soon have the API error soon. I have mostly switched to the Codex 200 a month plan. I've found their 5.5 xhigh to be better than Opus 4.8 "ultracode." Also, i have not once seen their servers fail for compute unavailability, unlike Anthropric which happens almost ever hour.
reply
matheusmoreira 2 hours ago
I just asked Fable for a complete code review of my lone lisp project. Started out strong. Launched Fable agents, then spent like 10 minutes thinking... And then got interrupted by a switch to Opus 4.8.

> Fable 5's safety measures flagged this message for cybersecurity or biology topics.

> They may flag safe, normal content as well.

> These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them.

Here are the results of the agentic code review session:

  ┌──────────────────────────┬───────────────┬────────────────┐
  │          Agent           │ Fable 5 turns │ Opus 4.8 turns │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ values                   │ 134           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ data-intrinsics          │ 104           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ tools-tests-build        │ 81            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ core-intrinsics (failed) │ 25            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ system-memory            │ 44            │ 20             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ reader-modules           │ 104           │ 25             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ linux-startup            │ 95            │ 15             │
  └──────────────────────────┴───────────────┴────────────────┘
This 40 minute session cost me 16% of my weekly usage. A simple code review of the most critical areas of my project got flagged as a cybersecurity risk. It really made me not want to try it again.
reply
kkoncevicius 3 hours ago
I had a similar experience. I wanted to test it by asking it to summarise a scientific OMICs-related paper. It gave a warning about me potentially developing a bio-weapon or something like that. And switched back to Opus 4.8.
reply
smith7018 3 hours ago
Fwiw it's not available on my enterprise account: "Disable zero data retention to unlock Fable 5 access"
reply
stronglikedan 3 hours ago
We just blocked it at our org for this reason. They will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."
reply
sdellis 3 hours ago
What does "zero data retention" mean? What kind of data does it need to unlock?
reply
drakythe 3 hours ago
The announcement details it. They're storing 30 days of data on all surfaces, first and third party. They claim it is for security purposes so they can review and check for long term jailbreak and distillation efforts.

They also, FWIW, say that they've instituted new policies on their end such as logging any human access to the stored data and automated deletion after 30 days in "most" cases (with another link to a document detailing that further).

reply
jrumbut 55 minutes ago
It could be my use cases, which have always seemed to be outside the wheelhouse of these models, but I find it very hard to downgrade after accessing a more capable model.

Opus 4.8 produces output in 15 minutes that is 3-4 hours of my work away from output that used to take me 40ish hours (a solid week of dedicated effort).

Last year(-ish, maybe it was 18 months, I forget when the jump happened), the frontier models couldn't touch this work. The output looked like a hardworking intern on their first day. Nice formatting, decent volume of words, but no understanding.

So it might work if it turns out to be a substantial leap in capability.

reply
GoToRO 44 minutes ago
I switched back to Sonnet. It replies faster so I work faster. Also cheaper. But I really like the speed. I have to be more specific with what I want. Also I stop it more often than Opus. These new models will be awesome, but they need to increase the speed.
reply
spaceman_2020 27 minutes ago
Kimi 2.6 has been my workhorse now. It's as good as Opus 4.6, which, to me, was the last "useful" Claude model.

The newer models are smarter but really ficklle and hard to get meaningful work out of

4.6 was a workhorse

reply
ltrg 2 hours ago
Fable seems very good at finding bugs (unsurprising given Mythos lineage), so this seems a pretty smart strategy. Once you see the bugs it finds in your existing Opus code, it's going to be hard to go back, psychologically speaking.
reply
kyledrake 4 hours ago
Considering their apparent nerfing of the end user plans in favor of enterprise clients, is Anthropic still the "more ethical AI company" like everybody loves to tell me all the time?

Assuming this isn't just a supply issue on their side, nothing says "ethical AI" like only allowing mega corporations to use it through cost barriers.

reply
estearum 4 hours ago
You really misunderstand what AI-doom people are worried about if you think this is anywhere near the top (or middle, or bottom) of the list of concerns.
reply
Jackson__ 3 hours ago
If you can't trust them to act ethically on the small scale, why would you expect that to turn around once it gets to a larger much more important scale?

How many government sanctioned school bombings does it take for them to quit working with said government? For now we know that number is somewhere between infinity and 1.

reply
estearum 3 hours ago
It literally does not register as "unethical" at any scale to have different products or prices for different customer tiers.

The question of collaboration with USG is a much more complex one, but is not the one raised above.

Edit: I'll also add that I doubt any AI-doom people "trust" Anthropic per se. The entire angle of questioning – again – misunderstands the AI-doom argument. You appear to think that if companies behave unethically, they cannot be trusted and they will not produce good outcomes, inversely: if they behave ethically, they can be trusted, and they will produce good outcomes.

Any competent AI-doomer would argue that ethics or trust are essentially irrelevant.

The entire problem is that people can act totally reasonably, even ethically, and this is not a guarantee of good outcomes. Situations can be created in which completely ethical, reasonable behavior actually produces a bad outcome. You do not need to assume people are bad in order to produce a bad outcome, and inversely you cannot assume that you will get a good outcome from good people.

"Arms races" are one class of situations that often have this characteristic. "Bureaucracy" is another class that we encounter a lot in daily life. There's a lot of them!

reply
throwaway894345 3 hours ago
Yeah, it's positively precious to think the specific pricing strategy for consumers is the overriding ethical concern with OpenAI, etc. I don't have any particularly strong affinity to any AI company, but comparing pricing to say mass surveillance is ... something else.
reply
kyledrake 3 hours ago
Your beautiful straw man is negated by the fact that Anthropic seems quite eager to get back on the DoD gravy train https://www.reuters.com/business/aerospace-defense/blacklist...
reply
jnovek 2 hours ago
Your original comment was about pricing ethics, does Anthropic’s connection to the DoD have anything to do with pricing ethics? They’re in no way coupled, one can be ethical while the other is not.
reply
andriy_koval 2 hours ago
even for Pentagon thing, Dario said he doesn't object military AI, but said Claude is not ready YET. I speculate he was afraid of reputational damage from cases if Claude would guide missiles on elementary schools.
reply
throwaway894345 49 minutes ago
I admire the confidence with which you started typing a reply that had nothing to do with my comment. Bravo!
reply
ygjb 3 hours ago
Setting aside the simple fact that there is no ethical consumption under capitalism, the reality is that regardless of how Anthropic feels, it is becoming clear that many, if not all countries regard AI developments as strategic technologies (and they should).

Anthropic needs to be at least somewhat in the good graces of a capricious administration that is already under pressure from businesses and citizens to regulate AI companies across multiple different domains, whether it's energy consumption, job displacement, military and defense applications, surveillance, etc.

If Anthropic wants to survive, they need to acquire influence with the government that most impacts them as an American company, and a massive exporter of services in the AI space to other countries, otherwise they could get locked down and locked out of the market for national security reasons.

It sucks, but sometimes the survival choice is to make an ethical compromise in hopes that you can still be around to make better decisions later.

reply
ericmay 3 hours ago
> Setting aside the simple fact that there is no ethical consumption under capitalism

This "simple" fact needs quite a bit of additional context and work. Making grandiose ethical claims like this can be countered with other grandiose claims such as the fact that there is no ethical existence under communism or socialism.

reply
ygjb 3 hours ago
Sure. Why not, I'm bored today and waiting for some stuff to finish up :D

The fact that there is no ethical consumption under capitalism is not material to whether or not ethical existence is possible under communism or socialism. In order to survive in a capitalist society, one inherently has to make choices that require trade-offs, and those trade-offs are burdened by a history of decisions made not just by the people alive today, but our ancestors as well. Does that mean I walk around chanting "Reparations", "Land-back", or other calls to action? No, but I do acknowledge that there are unresolved issues and as a Canadian, I know we need to do more to resolve treaty issues, and environmental issues, and system discrimination. I also know that Americans need to do better to address systemic discrimination and many, many other issues. It also doesn't mean I want to give back my house, or give away all of my possessions. It just means I try to make good choices and support businesses and people that are open about the trade-offs they make and try to engage as ethically as possible.

Acknowledging those facts doesn't absolve us of responsibility, it's a framework that allows folks concerned about whether or not they are doing the right thing to accept the trade-offs that they choose to make and be responsible and accountable for those choices to themselves or their communities.

We live in a world with scarce resources. It's possible that with a foundational redesign of the global economy, and the requisite authoritarian government that would be required to force such a redesign, we could eliminate food scarcity, solve energy scarcity, and make sure that everyone has a place to live. Those trade-offs are probably not worth the ethical cost in political and physical violence required to accomplish it. We have seen the trade-offs that happen when the powerful are able to exploit communist or socialist governments. We are seeing the "late stage capitalism" impacts of allowing the powerful to exploit capitalism in democratic societies. Acknowledging that the current capitalist system has lead to the greatest prosperity for the upper echelon (financially) of humanity, and a dramatic reduction in global poverty shouldn't obscure the reality that much of that wealth comes from exploitation of people and the environment.

It's a huge problem to unwind, and we can't let the burden of every choice that we make stop us from trying to do better, but we (as in society in general) can't do better if we don't at least acknowledge the compromises we are making along the way, and try to plan to fix it in the future.

Probably a topic better suited to beer and a pub setting than HN though :P

reply
ericmay 33 minutes ago
> The fact that there is no ethical consumption under capitalism

I don't believe that this is a fact. How are you demonstrating that this is a fact?

When you talk about things like reparations or "land back" you're already cargo-culting in concepts and ideas that themselves need to be fleshed out in order to make a subsequent claim that a specific economic system is unethical. Someone can just argue all economic systems are unethical, how are you going to defend against that? And can you pay reparations for example without going back in all of human history and finding all cases of injustices and then tallying it up? Why pick an arbitrary point in time? Better yet, why not start in countries where slavery still exists instead of focusing on the west which led the world in abolishing slavery and created concepts such as universal human rights.

Even with respect to "eliminating food scarcity" - eliminate in what sense? All olive groves and grapevines and rice farms have to be destroyed and rebuilt to only build certain foods?

Dabbling in communism or other inhumane and authoritarian governmental systems is extremely dangerous and in the same vein of extraordinary claims required extraordinary evidence, suggesting as you did creating an authoritarian government to create a utopia is precisely the same project of suffering and death that mass murderers throughout history have undertaken to abject failure, and thus, you need some incredible amount of evidence and theory to be able to even fairly suggest going down this path.

reply
ygjb 3 minutes ago
It's simple, I am not going to defend any economic system because they all require trade-offs, because any economic model that we could currently implement must necessarily ration scarce resources according to some set of rules. Those rules will explicitly deny someone else resources, and the adminstration of that economy will also be subject to abuse by the people who enforce the rules.

I am not going to do the work of gathering the evidence for you, and I don't think this is the right venue for a debate on the topic.

reply
cleaning 2 hours ago
It only needs additional context and work if you are unfamiliar with the concepts underlying it. Possibly consider you are out of your depth here, rather than jumping to conclusions.
reply
ericmay 43 minutes ago
No that's incorrect. Instead I believe the underlying concepts are debatable and so stating it as a "simple fact" is a bit unfair.
reply
estearum 3 hours ago
Where is your evidence that this is Anthropic backtracking on its ethical and contractual commitments rather than DOD backtracking on its blatantly illegal coercion (which it's almost certainly going to be successfully sued for)?

Talk about a strawman!

reply
kyledrake 3 hours ago
As someone that was in Minneapolis during the ICE raids, including one where a US citizen at a nearby restaurant was thrown in prison for 3 days despite having his passport on hand because he looked asian, it's hard for me to not equivocate the ethics of AI companies actively collaborating with the Trump administration as different flavors of ice cream.
reply
estearum 3 hours ago
Are the two analytical frameworks available to you just "black and white thinking" or "it's different flavors of ice cream?"
reply
kyledrake 3 hours ago
Are the personal attacks really necessary to make your argument?
reply
estearum 3 hours ago
Fair point! Edited to remove.
reply
DonsDiscountGas 3 hours ago
I don't think offering a product under a certain set of terms obligates a company to maintain that offering forever. The bait and switch is certainly annoying but seeing as they're very upfront about it you can't say you weren't warned. Don't like it? Don't use it.
reply
dllrr 19 minutes ago
They said they would release it back into subscriptions as capacity allows in the future. If they don't, people are going to point back at it and rake them over the coals.
reply
MattSayar 44 minutes ago
It smells like an architecture-related issue to me. They wanted to release the model asap, but they're still implementing the fine-grained controls to constrain the model to non-subscription users.
reply
eli 2 hours ago
It's unethical to price it in a way not everyone can afford?
reply
wongarsu 3 hours ago
I wouldn't call Anthropic ethical. But between Anthropic and OpenAI, Anthropic is the more ethical one
reply
brianmcnulty 4 hours ago
Why would you have ethics when you could get that IPO money instead?
reply
xvector 4 hours ago
Yup - who cares about x-risk or red lines for domestic mass surveillance anyways? I draw my red lines at prioritizing profitable customers when heavily resource constrained. That's the true definition of evilness!
reply
Maken 4 hours ago
The bar is just too low.
reply
fridder 4 hours ago
More ethical in some areas, actively user hostile in others
reply
thisisit 14 minutes ago
One can hope it helps Claude to figure out how to solve their buggy payment system - otherwise how do I pay for these credits.
reply
KronisLV 54 minutes ago
> it feels like they are trying to get subscribers to switch to usage-based billing

I think they might be hitting a point where subsidizing the expensive models for subscriptions makes less and less sense.

With Opus 4.X, last month I paid 100 USD for the Max subscription and got a token equivalent of 4.1k USD.

I imagine that Fable is more expensive to run.

reply
nickandbro 4 hours ago
Get them addicted then cut them off. Oldest trick in the book.
reply
toomuchtodo 4 hours ago
More of a free trial to those authenticated and qualified with existing payment. Subscription billing is going away for sure though eventually based on the economics. Token “all you can eat” is a capital furnace otherwise.

(I’m highly confident open models will eventually achieve a similar performance benchmark with distillation over time)

reply
chinathrow 12 minutes ago
Yeah that payment scheme sounds like they gear up to shift everyone into API token prices, eventually. Time to convert the existing tokens into software, until then.
reply
CuriouslyC 4 hours ago
Subs lose money on individuals to get those individuals to force their companies to pay for the corporate plan. The economics are bad, but so are the economics of grocery stores selling Milk and Bananas at a loss to drive traffic, which they basically ALL do.
reply
eptcyka 2 hours ago
I pay a lot but barely use it except for some intense days, where the lower plans would have throttled me in like 30 minutes. API billing is still more expensive. If you want to not pay much, go to openrouter and use chinese models. They are cost efficient.
reply
HDThoreaun 3 hours ago
I havent seen any evidence showing that subscriptions cost the labs money.
reply
toomuchtodo 4 hours ago
Companies don’t want to pay when the value realized does not exceed the cost.

AI Savings Misses 'Should Be Making Executives Uncomfortable,' Bain Says - https://news.ycombinator.com/item?id=48359010 - June 2026 (0 comments)

AI sticker shock hits corporate America- https://news.ycombinator.com/item?id=48307098 - May 2026 (146 comments)

reply
CuriouslyC 4 hours ago
What's the realized value of not losing your engineers because you're letting them use their preferred tools?
reply
toomuchtodo 3 hours ago
Retain and hire the engineers who don’t require heavy use of AI to deliver value? The current SWE job market speaks for itself. Where will you go where they will let you burn up tokens in a high cost of capital macro?

ZIRP (zero interest rate policy) is over, software engineers no longer call the shots now that there isn’t vast amounts of capital chasing yield, and that capital bidding up salaries and keeping the labor market for engineers tight.

If you are x more productive with generative AI, very shortly you are going to have to prove it with a token budget (or, if you’re lucky, an org willing to spend for on prem hardware for capped token cost, fixed capex vs uncapped opex).

The comparison is not SWE vs SWE with AI. It is SWE vs SWE with AI with a constrained token budget ($x/month) delivering the same value at the same or lower cost. If you cannot prove that you are wildly (vs marginally) more productive with the AI, why would they pay for it? Prove it.

reply
alvis 4 hours ago
It’s too obvious that antropic need to find way to earn enough revenue before IPO. Claude subscription isn’t earning earning much money I bet
reply
sigmoid10 4 hours ago
I think they are just prioritizing enterprise customers, because this is were historically they made most money.
reply
dylandevelops 3 hours ago
I agree with you here. Unfortunately, this tends to be the case, with smaller developers paying the price.
reply
sdellis 3 hours ago
That's a big problem for all of the AI companies. Most people don't find the technology compelling, accurate, or ethical enough to pay for a subscription.

Why wouldn't Anthropic just wait until people start subscribing, do some kind of marketing push, or obtain some kind of other sustainable revenue stream, before they go IPO? I wonder if they see the writing on the wall with all of this and want to cash out as quickly as possible?

reply
AtlasBarfed 3 hours ago
That's not how it works. They don't need revenue, they need addicts.

Specifically they need businesses that fired people and adapted their business to the products, so when the unsubsidized costs hit the businesses are forced to eat the true costs.

Yes they can't afford to give the products for free, but what is essentially happening with AI services is economic dumping, keep costs artificially low to get people to fire everybody, and then Jack the rates once they have Monopoly control

reply
sdellis 2 hours ago
But the only companies firing people (and certainly not everybody) are either the companies with an AI or the investment and finance firms that stand to profit from AI. I smell hype. And no company is firing everybody because of A.I.

I agree. They need addicts, but they are high on their own supply and everyone else can see the danger in getting hooked.

reply
xpct 4 hours ago
I agree, this looks like their plan to wane out subscriptions. This will probably come with Opus nerfs later.
reply
rapind 4 hours ago
I just assume Opus is constantly nerfed based on capacity. I was exclusively Claude for a long time, but the inconsistency in quality, constant outages, and slow downs were too hard to work with.

I just use dumb and fast models now. I'm more engaged. I think that the higher the quality of the model, the more you tend to vibe with it, and then the more hallucinations you then miss. I'm not sure which is more productive, but I definitely burn out faster the more I vibe. At some point you're spending your time on forums, discord, or youtube instead of engaged with what you're building. Or you yak shave about your tooling and end up creating the 600th multi-agent gastown harness and blowing thousands of dollars on tokens to create it only to discover it's too expense to actually use.

reply
dylandevelops 3 hours ago
I agree with you. The more I vibe code, the less interested I feel in what I'm building. Working with models that force me to think, especially with personal projects, helps me stay engaged and enjoy what I am doing more.
reply
winter_blue 4 hours ago
Composer 2.5 Fast that Cursor is giving away for very little has been amazing.
reply
daviding 2 hours ago
Given the Fable 5 costs it's getting tricker to weight up 'how smart do you want it', like looking at the top of this graph..

https://cursor.com/evals

reply
aplomb1026 3 hours ago
[flagged]
reply
nonethewiser 4 hours ago
It's possible that they will transition to usage credits but why not take them at their word? To date they have continued to offer better and better models to their subscription plans.
reply
timcobb 4 hours ago
What's their word? Have they commented?

Upd: I meant big picture, not with respect to this model release. Where do subscriptions figure into their strategic vision. Will consumers end up paying enterprise prices in the future?

reply
KyleJune 3 hours ago
In the blog post they say when sufficient capacity allows them to do so they aim to restore Fable 5 as a standart part of subscription plans and intend to do so as quickly as they can.
reply
dbbk 3 hours ago
Read it again
reply
timcobb 3 hours ago
I did, I'm not seeing anything about the future of subscriptions at Athropic.
reply
ls612 3 hours ago
In TFA they say they intend to restore Fable 5 to subscription plans some time after June 22. That is what "take them at their word" means.
reply
taormina 4 hours ago
Those already landed! Oh, you weren't talking about 4.8?
reply
piva00 4 hours ago
Even Opus 4.7 felt like a regression from 4.6, consumed a lot more tokens while I didn't experience any substantial improvements. The company I work at simply rolled back to 4.6 on everyone's configurations, disabling the toggle for 4.7.
reply
taormina 3 hours ago
4.6 has been my happy place for getting anything done for a while now.
reply
xvector 4 hours ago
HN needs to take a chill pill. Could it be that Mythos is expensive and they just want to give people a taste of it? I mean the alternative is not offering it at all?
reply
8note 3 hours ago
its unclear how they can offer it broadly but only for half a month.

why do they have capacity now that they wont in a few weeks?

reply
losvedir 3 hours ago
Break between training runs?
reply
bigtechennui 3 hours ago
It’s offered broadly after, for more money. It’s subsidized as marketing
reply
timcobb 4 hours ago
Ooof so are we thinking that in the next 6-12 months subscriptions will be replaced with paying retail like enterprise currently?
reply
CuriouslyC 4 hours ago
I don't think they'll phase out subscriptions ever, their whole play has been to drive demand from the bottom up. Get engineers hooked on building with claude at home, then get them to demand the ability to use it at work, and bend over their employer with no lube.

They'll probably tighten the quotas to reign in whales though.

reply
aseipp 3 hours ago
They almost certainly already make a fuckload more money off API pricing than they do subscriptions, even if there might be more total subscription users. So offering subscriptions even at some loss is probably going to continue. Honestly, I'd be surprised if they even lost money on most subs; there are definitely Token Whales out there who mess up all the accounting up, though.

Realistically I think Anthropic just has insane demand but finite capacity to run models, and Fable will just make them more money if they dedicate it to API pricing. I suspect the goal here is something like: get individual engineers/PMs on their personal plans to taste Fable and then go to their meetings and say "Yes doubling the price of every single input/output token is a good idea, boss".

reply
timcobb 3 hours ago
But I don't want to be the developer who goes and says we must pay all this money for these tokens. I don't know who wants to be that developer.
reply
treenutlog 2 hours ago
[dead]
reply
gck1 40 minutes ago
But how is this sustainable? It's not like paying $5000 per feature means you'll be refunded if prompting "make no mistakes" didn't work.

The only reason why I pay $200 is because LLM's errors costs me that much, at worst. If "make no error" starts working - sure. But surely, unless you have millions of dollars of cash to burn, a coin flip that costs $5000 is an insane idea?

reply
thewebguyd 3 hours ago
I certainly hope not. PAYG is not predictable enough for smaller companies or individuals. Where I work (non-tech company), PAYG would never fly. We aren't big enough for that. Of course, you can set usage budgets, but there's a pretty big difference between $200/user/month vs. the equivalent PAYG usage being closer to $1,000/user/month, if you currently use the subscription plan to its limits each week.

Going PAYG only will effectively take these tools away from a huge amount of people and accelerate the push for local LLMs.

OTOH, accelerating the push for local LLMs would also be fine with me.

reply
ygjb 3 hours ago
I doubt it, given the importance of those subscriptions for building and maintaining market awareness.

The AI landscape is changing rapidly, and with Apple announcing the option to change the AI backend, and potential requirements enable AI choices as well, similar to EU browser choice requirements (this is more reading tea leaves than any actual requirements I am aware of). The new OS changes coming to support Googlebook, and deep Copilot/AI integration into Windows will make maintaining user facing subscriptions essential for independent model developers like OpenAI, Anthropic, and Mistal to remain relevant longer term.

If the don't maintain that relevance there is increasing likelihood that they will get consumed by other companies whether it's Apple, Microsoft or Google to form a foundation for their OS, or other cloud providers.

reply
timcobb 3 hours ago
That make sense, but what about the specific bifurcation we're seeing here of super primo models versus still good models being available to subscriptions?

It's kind of annoying not getting access to the primo model and paying 200 bucks a month. I understand 200 bucks a month is basically nothing though.

Like I don't totally understand why they'd let me have it for a couple weeks and then take it away and say I can have it but I have to pay retail and retail is like $1,000 a day.

It's better to have loved and lost than to have never loved at all??

reply
ygjb 3 hours ago
It's a trade-off. Every hyperscaler is buying and building compute capacity as fast as they can dodge red tape. There is limited compute capacity, and scarcity is a real thing.

As a consumer I can choose to buy subscriptions to a range of things, including $5 droplets or VMs on a broad range of cloud hosting providers. I can even buy cheap bare metal at a bunch of providers at an affordable retail rate.

I can also buy "unlimited" AI packages that will be optimized to fit the cost model from a variety of services, with different impacts, such as rolling outages when I consume a daily or hourly allotment.

Right now VC and the investor class are subsidizing the rapid evolution of the services and availability, but that VC is running out. In more traditional economies, AI would have developed and rolled out more slowly, and through metered subscriptions, with the eventual rolling out of "unlimited" packages like telephone, internet, or cell services once the market became commoditized.

We have seen a big inversion of that with the race to "win" AI marketshare. Now the true cost is being exposed, and the most competitive and capable models are hideously expensive to operate, so it makes sense that we are moving to metered billing for a utility service. If you want gas, you can buy regular or premium. If you have a premium car you definitely want the premium, but for most people regular is good.

Give it a couple of years, and the survivors will settle around fairly industry standard models of consumer grade services, pro-sumer accounts, and business/enterprise models.

Things are still shaking out, but I get the sadness. Luckily I work at a big tech company who is banging the drum on doing experimentation so I use my prosumer claude pro and other accounts at home for hobby stuff, and save my heavy lifting and potentially experimentation for work :P

reply
sytelus 20 minutes ago
Enterprise subs not allowed to use Fable if they have setup zero data retention :(
reply
madrox 52 minutes ago
I suspect it'll go on the subscription plan once other providers have similar benchmarks.

As annoyed as I am about this move, I get it. Users flood the newest, best model whether they really need it or not, and are efficient at using their entire quota. They've had so much trouble reigning in subscription usage it makes sense.

reply
irthomasthomas 3 hours ago
This is just the sales team doing their thing, applying the Law of Scarcity to drive demand.

It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1

It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.

reply
nicce 3 hours ago
> The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

Probably all about the IPO.

reply
dack 2 hours ago
i doubt that's the goal for them. i bet they just really don't have capacity for people using it a ton, yet they wanted people to be able to try it out while it's new. so they compromised and made it temporarily available. and then hope they can get costs down or capacity up so they can make it more available again
reply
InsideOutSanta 2 hours ago
I think the goal is "private citizens: subscriptions; corporations: per-token billing." It's getting people addicted to LLMs on cheap subscriptions so that they can then force companies to pay for expensive inference.
reply
irthomasthomas 2 hours ago
"we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

reply
altcognito 2 hours ago
Where is this text coming from?

[edit] -- I see that this comes from the system card -- dang merged the comments from the other discussion so that explains the confusion.

reply
matheusmoreira 3 hours ago
This is really sad... I really didn't want to be priced out of these models but it looks like that's going to happen sooner rather than later.
reply
deepfriedbits 2 hours ago
Thankfully this, like most other tech, will get cheaper through the years.
reply
gck1 37 minutes ago
It already is. But marketing is hell of a drug.
reply
treenutlog 2 hours ago
[dead]
reply
Aleleo76 3 hours ago
Pay-as-you-go billing is a kind of drug, I use it every now and then when I'm working on a project with Opus, in a moment you spend a fortune
reply
ABS 4 hours ago
also: Fable takes 2× the usage of Opus
reply
oersted 3 hours ago
> Pricing for both models is $10 per million input tokens and $50 per million output tokens.

The step-up in intelligence looks massive (we'll see in practice), but the price is getting to a point where it's making me question if it's even worth giving it a try.

Good competitors will probably be out soon, which should level the playing field. I am more excited about that, just the fact that they showed that such an improvement is possible. I'm okay waiting a bit longer for this to become attainable for plebs like me.

reply
kmac_ 17 minutes ago
Models are getting better, but there's a negative change in terms of "productivity" per dollar. Yeah, I can throw 5 sub-agents at the problem, but the cost is getting significantly higher. And yes, I can crank out the solution much faster, but again, at some point that cost will be hard to justify. And it doesn't matter if the cost is subsidized by a provider, if it's paid by your company, or from your pocket. We are slowly reaching a point where the cost will be too high to justify the gains.
reply
xyzsparetimexyz 3 hours ago
This is probably the end of 'use the best model no matter the price'
reply
kolinko 3 hours ago
The pricing can be a bit deceptive though. A good model can deliver the same results in fewer tokens.

Kind of like billing a programmer by the hour.

reply
zyuiop 2 hours ago
Sadly this does not seem to be the case here: if you read the announcement entirely, they include a "cost per task" metric which basically continues the trend of their previous models. So yes, tasks will cost you more, but results will be better - allegedly.
reply
sourcecodeplz 3 hours ago
Why wouldn't it be? How much would you pay a scientist at this point to think about a problem for you and give you a solution?
reply
oersted 17 minutes ago
I'm not sure how it might be with Fable in practice, but we are already not that far away from AI costing as much as a full-time professional, faster in some ways but considerably less independent.

Perhaps not that close to US salaries, but those are inflated to hell. Worldwide senior engineers and scientists have salaries just about an order of magnitude away from AI subscriptions that you can use most of the day every day.

reply
clementg 4 hours ago
I really don't want this to start being the norm
reply
baggachipz 4 hours ago
I don't see how it won't be. They lose insane amounts of money on subscription plans. I'm sure they still lose money on usage-based billing, but probably not as much.
reply
JumpCrisscross 4 hours ago
> They lose insane amounts of money on subscription plans

Do we know this? I’ve seen evidence they lose money on heavy users. But so do gyms.

reply
saaaaaam 3 hours ago
How do gyms lose money on heavy users? A heavy gym user isn’t really costing the gym anything extra as far as I can see.
reply
JumpCrisscross 3 hours ago
> How do gyms lose money on heavy users?

Most gyms sell more subscriptions than they can fit under their roof at one time. If a gym only sells to heavy users, it will either be constantly turning members away or have to buy more equipment. Its equipment will wear off faster. Depending on amenities, it will go through towels, soap, water, et cetera faster, too.

reply
tripleee 3 hours ago
Gym equipment lasts 10+ years in a commercial gym, at $50/mo that's a minimum of $6k paid from a single person.

Unless they're really, seriously wasteful with the soap.. there's no chance a gym is losing money on a heavy user

reply
rafram 3 hours ago
It depends on the gym and their business model! A super-budget gym like Planet Fitness that charges $15/month is going to lose money on heavy users, but they count on most of their members being infrequent gym-goers. A luxury gym like Equinox that charges $300/month can target heavy users without any issues, and they'd actually rather members go more so they stay and spend money on expensive salads and smoothies.

Right now all these AI subscriptions are priced like Planet Fitness, but they're used like Equinox. They're hoping that the new a la carte offerings will move their pricing more in that direction as well.

reply
charcircuit 3 hours ago
>I’ve seen evidence they lose money on heavy users.

Where?

reply
JumpCrisscross 2 hours ago
There are tons of blog posts where folks work out the API cost of their usage and find it well above subscription cost.
reply
otterley 2 hours ago
That doesn't mean the company is losing money in aggregate on these subscriptions. Buffets are still in business even though some people gorge themselves silly at them. The incremental cost may exceed the incremental revenue for a particular person or minority group, but that's not how these businesses measure profitability.
reply
cautiouscat 3 hours ago
I assume consumers aren’t a big note in their bottom line. I’m not actually very sure about that, just an assumption.

What I wonder however is if these tools will become something I use at work only. $100/month is already a massive stretch budget wise. If these models keep devouring tokens there’s no way I’d get the same usage time out of them for $100 in usage credits.

I just don’t think I’d use them much at all at home.

reply
DonsDiscountGas 3 hours ago
I expect that depends on demand, feedback, and whether GPT-6.0 gets released and is competitive
reply
lisperforlife 3 hours ago
My guess is that it is a massive model similar to GPT 4.5 and $10/$50 pricing is for its output will discourage people from using it. I also read safety = nerfed.
reply
daft_pink 3 hours ago
I’m just about ready to cancel my small business 5 user plan with max licenses, because although cowork is really great. I just find OpenAI/Codex to be a lot better most of the time.
reply
a-dub 3 hours ago
the claimed inference cost is 2x. if that is true, it is massive and remarkable that they're able to do anything like this at all.
reply
dirkc 2 hours ago
This serves as a good reminder that relying on AI models is borrowing your tech from someone else. They can take it away or raise the prices arbitrarily.

If you rely on this as a core part of your business/profession, you will be at their mercy and subject to whatever whims or challenges they have.

reply
meowface 4 hours ago
It's very disappointing but I'm assuming it's for rational reasons on their part.
reply
FergusArgyll 3 hours ago
I'm about to be priced out of SOTA llms and it's an awful feeling
reply
wahnfrieden 2 hours ago
Not with Codex
reply
chinathrow 10 minutes ago
Why won't they follow suit?
reply
FergusArgyll 2 hours ago
But they're behind by quite a bit now. CFO (of OAI) Sarah Friar said the next training run will be in the fall on Vera Rubin, I think that means I'll have to wait > 6 months?!
reply
__blockcipher__ 49 minutes ago
Yeah but they might still have an unreleased bigger pretrain than 5.5. (but maybe not). still 5.5 is smarter than opus 4.8 IME, so you're only losing the mythos tier (fable). and all the cool fun stuff i'd want to use fable for our blocked (can't have it do even defensive cybersecurity work [in theory you can but the classifiers fire like crazy], can't discuss stuff like the furin cleavage site of sars-cov-2, etc)
reply
deanc 2 hours ago
But it's not and it's highly disingenuous to frame it like this. Quote directly from Claude code, moments ago:

> Fable 5 · Most capable for your hardest and longest-running tasks · Uses your limits ~2× faster than Opus

reply
systemvoltage 3 hours ago
It's interesting that we are seeing a time when subscriptions are not preferred and usage-based billing is.

Pay-as-you go isn't a common thing in SaaS. For example, except for AWS SES, all email providers are bulk-subscription based.

reply
nutjob2 3 hours ago
> "offer, then remove"

Sounds like "bait and wait".

If you think about it, the more people pay for these new and more resource hungry models, the longer it takes for them to become no extra cost and the longer it takes the more people are tempted to pay extra.

reply
rvz 4 hours ago
> * On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

Of course, they are a casino as well giving you free spins at the wheel with their new Fable machine, and it is done on purpose.

Once there freebies have expired, many of its users will begin to gamble more on the new casino machine and will realize that it is expensive.

reply
xvector 4 hours ago
If it's that big of a problem to you, you're free to just... not use the freebie?
reply
cautiouscat 4 hours ago
It’s an interesting thing to bring up because it’s this classic thing we’ve seen for decades now.

The ramifications go beyond the individual which is why I assume they mentioned it. They don’t need to use it/not use it for it to have interesting implications.

reply
xvector 4 hours ago
so it'd be preferable if they didn't include the model at all?
reply
cautiouscat 3 hours ago
I didn’t say that and I don’t have a feeling on that either way. But this is a limited time trial and calling it out as such is valid.

Is it nice we get the trial? Sure. Is it also a common play in the playbook of tech companies? Yes.

reply
rvz 12 minutes ago
Then you better not complain how expensive it is to use (Just like the other companies are doing) or the next time Claude goes down then.

Anthropic does not care about us and isn't going to talk to you either and will extract from you as much as possible.

The true answer is local models.

reply
danslo 3 hours ago
It's not a freebie, it still requires a subscription and burns tokens twice as fast as Opus.
reply
aray07 4 hours ago
i have never seen this before - where you offer something and then take that away
reply
machomaster 4 hours ago
Really, you have never heard of shareware or trial periods?
reply
tasuki 3 hours ago
Either that or it was sarcasm. What do you think more likely?
reply
firemelt 3 hours ago
damn they are drugs dealer
reply
OOTW 2 hours ago
[dead]
reply
AAYALAG 2 hours ago
[dead]
reply
mohsen1 2 hours ago
It seems like Fable will refuse to do any work when it comes to developing LLMs or even asking questions about topics related to LLM. Simple things like asking to explain a paper fails!

From the model card:

In light of the ability of recent models to accelerate their own development, we've implemented new interventions that limit Claude's effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design. Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.

reply
girfan 7 minutes ago
This is super annoying and imo, really limits the usefulness of this model. It speaks volumes about what Anthropic's position as a company and its priorities will be going forward. I doubt this kind of gatekeeping will prevent open-models or other innovation outside Anthropic to slow down. I would imagine these guardrails, if needed at all, should be done at a legal framework level and students should not be a part of this blanket approach to limiting the usage of these models.
reply
Chance-Device 33 minutes ago
I was wondering when something like this would happen. I got my first and only two content violation warnings in Claude Code last week when asking it about something ML related. It was a real head scratcher because I couldn’t figure out what about the requests could have violated anything.

Might be worth going back and taking a harder look at what I was asking it about if it somehow triggered a “forbidden knowledge” alert. Or maybe it was just a random bug.

reply
agnosticmantis 2 hours ago
Singularity for me but not for thee.
reply
foolfoolz 2 hours ago
you will RENT the singularity
reply
Xunjin 23 minutes ago
"we should put on hold the development of AI because the world is not ready for it"

Yeah... We need open models so we don't have that BS.

reply
properbrew 51 minutes ago
> frontier LLM development

This seems so wide reaching if it's catching simple things like explaining a paper. Does this also refuse to help with any already developed training pipelines?

I can kind of understand the generation of synthetic data, but nerfing the assistance of training pipelines just seems like a really shitty thing to do.

reply
throwfaraway4 47 minutes ago
"for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design"

Oh man all of those runaway infrastructure buildouts by our agents trying to achieve singularity...

Just say you don't want to lower the bar for others to compete

reply
skerit 6 minutes ago
That's strange... I've been tinkering with a little LLM-from-scratch project for a while now, and Fable is just continuing it without a problem
reply
elastic-hoover 12 minutes ago
I wanted to try on my biology research and it refused to talk about it and proxied to 4.8. Really, only surface level conversations about topics of interest. I know this is not a topic of broad and mass interest, but limiting it for topics like that and machine learning will probably do change how I use it.
reply
lxgr 8 minutes ago
Yes, this stuff is really annoying when it misfires. I've had all my subsequent ChatGPT conversations biohazard-contained for several days for the crime of asking it to explain a gene drive to me.
reply
foolserrandboy 5 minutes ago
This is just marketing that Anthropic is building the singularity.
reply
gpugreg 36 minutes ago
Anthropic probably trained Mythos on their own code and found that it is too got at reproducing it.
reply
teaearlgraycold 6 minutes ago
I doubt that. Why would you train Mythos on its own code if you don't want it to be able to reproduce it? It's not going to add much to the overall corpus.
reply
schipperai 58 minutes ago
Let's hope not all frontier AI assimilates these guardrails. It would be a shame for independent researchers and students.
reply
__blockcipher__ 47 minutes ago
Anthropic is really speedrunning their evil arc as fast as possible. Can't use them for basic LLM research, cybersecurity, or beyond-surface-level discussions of biology and virology, but Anthropic is allowed to sell Claude to the trump administration to kidnap maduro and to bomb iran. And don't get me started on that $100M autonomous killer drone swarm contract that they applied to and rationalized as non autonomous...
reply
LordDragonfang 27 minutes ago
> Can't use them for basic LLM research, cybersecurity, or beyond-surface-level discussions of biology and virology

Your priorities are not everyone else's priorities. The people concerned about AI extinction risk list those as three of their biggest priorities for AI to not do. Those are the people whose culture Anthropic descends from, and by their measure, those exclusions make this the least evil path.

reply
SkitterKherpi 55 minutes ago
It also tried to force usage the paid Claude API instead of claude code usage just because there's a mention of another provider we might want to plug in (which hasnt even happened) for AI integration.
reply
dchuk 39 minutes ago
Ha funny, I was speccing out an idea for real time Claude code interaction from local apps using some tricks vs using the agent sdk when I got the popup to try Fable. So of course I gave it a go, and it triggered the sensitive content warning immediately, which I was very confused by until I put two and two together.

Fun times when “safety” means both the safety of mankind, and also the safety of revenues

reply
simonw 4 hours ago
Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8

Fable 5 default: https://gist.github.com/simonw/036bee5a703e7ec84e34efa974438...

Opus 4.8 (the "max" one is closest to Fable): https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-s...

Now here are the Fable pelicans for all five of the thinking effort levels - low, medium, high, xhigh, max: https://tools.simonwillison.net/markdown-svg-renderer#url=ht...

Low used 25 input, 1,929 output - 9.67 cents: https://www.llm-prices.com/#it=25&ot=1929&sel=claude-fable-5

Max used 25 input, 14,430 output - 72.175 cents! https://www.llm-prices.com/#it=25&ot=14430&sel=claude-fable-...

reply
sempron64 3 hours ago
The pelican has looked very same-y across all frontier models, same color bike, same camera angle, etc. I suspect this challenge is already too embedded in the training data to be a good signal when it succeeds, and maybe even when it fails in pathological ways mirroring existing AI pelicans on the internet.
reply
tripleee 3 hours ago
I'd say it's working great for its intended purpose. Keeps Simon on top of all these threads and funnels traffic to his site.
reply
port11 43 minutes ago
The ultimate measure of an LLM is whether it can produce a capable image of a pelican riding a bicycle. All other use cases are but a distraction!
reply
yreg 2 hours ago
I really don't understand what's interesting about this test and why is it always on top.
reply
simonw 2 hours ago
It's funny.
reply
depr 2 hours ago
Same reason you would always see the same top comments on reddit during a certain era.
reply
WithinReason 41 minutes ago
It's a meme, and HN loves upvoting memes. Just like Reddit!
reply
scrollaway 2 hours ago
Do you seriously have a dedicated “bad takes on AI” hn account?
reply
tripleee 2 hours ago
yeah, although I do combine it with "replies to snarky questions" for efficiency
reply
jurgenaut23 2 hours ago
True that
reply
kayge 4 minutes ago
Do you think the models are ready for the next level? I believe that would be: Pelican feeding Spaghetti to Will Smith.
reply
h4ny 2 hours ago
Was it ever a good test? How do you even objectively assess what a good pelican on a bike is anyway?
reply
fwipsy 2 hours ago
SVG generation is a good test because it's extremely easy to subjectively assess with visual reasoning where humans are strong. However, pelican on a bike specifically may be overused at this point.
reply
quantumwoke 2 hours ago
Variations of this comment have been posted for over a year. The pelican has now morphed into part of HN culture rather than a legitimate benchmark, but it's still valuable as a meme.
reply
brazukadev 51 minutes ago
it is more an example of gaming (the HN system) than meme.
reply
sarreph 4 hours ago
I'm beginning to wonder how much of a useful metric the pelican is because surely the frontier labs must be training their models on pelican-artistry because of how well known your test is now?
reply
bensyverson 3 hours ago
Simon has addressed this on virtually every new model release. He also has unpublished alternate prompts. But the larger point is: this is a fun experiment, not a serious and objective benchmark.
reply
refulgentis 3 hours ago
It's silly and a joke and a surprisingly good benchmark and don't take it seriously but don't take not taking it seriously seriously and if it's too good we use another prompt but don't actually because then it's not the pelican post and there's obvious ways to better it and it's not worth doing because it's not serious.

Only coherent move at this point: hit the minus button immediately. There's never anything about the model in the thread other than simon's post.

reply
stasomatic 2 hours ago
But what if they are better at flamingos? Are they optimized for pelicans? How about “draw me a four headed owl”? The meme, I get it, but I’d settle for a working bash script, tbh.
reply
wongarsu 3 hours ago
I just run my own benchmark for "draw an SVG with $animal driving $vehicle". I won't post my choice of animal and mode of transport, but there are plenty of uncommon combinations to choose from. So far it's a fun and visually intuitive benchmark that does seem to correlate with model capabilities
reply
notnullorvoid 43 minutes ago
The way I see it the benefit of benchmark isn't to take Simon's results at face value. It's a template for your own benchmarks that are easy to visually evaluate.
reply
modriano 3 hours ago
I don't know. Just looking at the bike frames (specifically the fact that the AI generated bikes have rather unsteerable front forks), it's clear to me that frontier labs aren't spending much time tuning models to make bikes look coherent, which I assume is an easier task than making a pelican riding a bike look coherent.
reply
iLoveOncall 2 hours ago
It was a completely useless test even before the labs trained for it.
reply
HaZeust 4 hours ago
I've seen this reply to Simon's benchmark for 2 years running now, and yet you still see improvements and objectively-bad results over time from new releases, even when I'm sure every frontier AI team has/had a person at least partially dedicated to better bicycle-pelican SVG outputs. Alas.
reply
sarreph 4 hours ago
I had intended to caveat that: I'm sure I'm not the first person to ask about this!

> you still see improvements

This is expected if they are training their models on it, right?

> objectively-bad results

Keen to learn when this has been the case, i.e. across version increments in major models.

reply
simonw 4 hours ago
I've written about this a couple of times, most notably here: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

I've been enjoying seeing how the quality of individual models differ based on the amount of reasoning effort you give them. If they were baking an a good pelican you wouldn't expect them to differ so much.

(Google Gemini are the only lab that have very clearly paid attention to the quality of SVG animals-riding-vehicles, see their announcement for Gemini 3.1: https://twitter.com/JeffDean/status/2024525132266688757 )

reply
sarreph 4 hours ago
Amazing, thank you Simon! Look forward to reading.
reply
38484858 3 hours ago
[flagged]
reply
llm_nerd 3 hours ago
I honestly assumed their comment was tongue in cheek humour, because positively no one actually cares how these models generate an SVG pelican riding a bicycle. It's some meme thing that this stuff always appears here.
reply
BrokenCogs 3 hours ago
Yeah this is not a real benchmark, it's just a fun tradition everytime a new model is released
reply
pelipost123 3 hours ago
"fun" / boringly predictable meme thread with 30+ replies already
reply
brazukadev 49 minutes ago
It is telling that people need to create throwaway accounts to criticize simonw's behavior in this website.
reply
raffael_de 33 minutes ago
I find it quite interesting that while the picture looks better the more advanced the model is, but apparently none so far "understands" that the pelicans legs are on both sides of the bike / top bar.
reply
LordDragonfang 18 minutes ago
If you scroll to the bottom of the Fable-5 by effort page, Max effort actually gets this correct! (Along with being the only one I've seen so far to make a bicycle frame that matches the shape of what most bikes on Google images look like)
reply
wasabi991011 2 minutes ago
And the only one linked here that includes a bicycle chain!
reply
ealready_value 4 hours ago
This is the reply I look for in all the new model announcements. Its fun to tell people that I judge models based on pelicans.
reply
pixel_popping 3 hours ago
This is all we need, that moment the Pelican put the leg behind the frame, we are all doomed.
reply
chorkpop 4 hours ago
Now someone post the link about how it’s impossible for humans to draw a bike from memory.
reply
upcoming-sesame 2 hours ago
I also look for this reply because i like seeing the follow-up reply saying that this is not a benchmark anymore because labs have gotten it in their training data.

that reply never failed to come it's basically a meme at this point

reply
bergheim 51 minutes ago
Anyone care about these pelicans that always come up anymore?

Clearly at this point they are part of the training data.

They even all look sort of ish the same. Daytime, colors,...

reply
1attice 43 minutes ago
Without being mean, I encourage you to go look at some of simonw's writing on this topic, which he has addressed repeatedly (and IMO satisfactorily.)

I know because I too had this initial take; however, upon analysis, it is not sound.

reply
bergheim 37 minutes ago
I know he is an AI influencer that promotes his blog any chance he gets.

I agree as well that he writes many interesting things.

reply
redox99 4 hours ago
It's interesting that they still get the head tube / handle bar part wrong.
reply
aarjaneiro 3 hours ago
Or the hands not being wings
reply
ethanlipson 4 hours ago
How much money do you think they spent fine-tuning on pelican SVG generation?
reply
tarruda 4 hours ago
Not as much as Qwen, since apparently 3.6 35B surpassed Opus 4.7 https://x.com/simonw/status/2044830134885306701
reply
csomar 4 hours ago
Probably none. They probably have much better targets to optimize for than an SVG pelican or even SVGs in general.
reply
leecommamichael 4 hours ago
Looks like Fable constructed the "max" "looking" pelican of the previous model for the "xhigh" output token count of the previous model.
reply
purple-leafy 44 minutes ago
Do we need a pelican every single time a model is released? Beating a very dead horse.

Fun at first, seems disingenuous now. A site funnel

reply
rkuska 3 hours ago
Is it possible to use the credits from subscription (https://support.claude.com/en/articles/15036540-use-the-clau...) for fable?
reply
382hi 3 hours ago
I'm pretty sure they're optimizing the models around these sorts of tests.
reply
makingstuffs 3 hours ago
I could be tripping but I’m sure that is very similar to the Deepseek one from not long ago. Clearly I am too lazy to go and find it for verification.
reply
jerryliu12 2 hours ago
Personally feel like it could be more ambitious with what it creates.
reply
gavinray 2 hours ago
Fable 5 xhigh actually looks the best to me.
reply
csomar 4 hours ago
Where is the clear improvement on Fable 5? The tail is misplaced.
reply
mercacona 4 hours ago
Why always sunny days?
reply
umeshunni 4 hours ago
Pelicans hate biking in the rain (as do I).
reply
david_shi 3 hours ago
that's a great looking pelican
reply
ge96 3 hours ago
need more Alex Moulton style bikes
reply
simunskxcsckss 4 hours ago
[flagged]
reply
minimaxir 4 hours ago
You can't tell someone to "get a life" while taking the effort to create a burner account for the sole purpose of insulting someone.
reply
rvz 4 hours ago
I don't really consider that a great benchmark anyway and we really need better ones that are objective instead of these mostly performative and cheatable and also available in the training set.
reply
ilaksh 4 hours ago
Simon's pelicans are an institution. Are you trying to get banned. Lmao.
reply
rob 3 hours ago
I think it's a clever thing he did to basically guarantee he continues to get major traffic to his blog here every time a model is released, especially since he's taking sponsorships with a static banner at the top of every page now. I think he's trying to go the Daring Fireball route.
reply
brazukadev 4 hours ago
For me it is like if crypto bros were allowed to shill their DAOs and tokens during the crypto/NFT phase.

He is the only person not getting rate-limited for shilling AI all the time.

reply
simonw 3 hours ago
Pointing out how much the models still suck at drawing pelicans is a funny way to shill them.
reply
toraway 3 hours ago
Tbf the first line of your first comment is:

  > Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8
And doesn't contain any actual criticism within the comment (your blog post might, but just referring to what was posted on HN, which is a bit booster-y on its own).
reply
simonw 3 hours ago
The entire pelican benchmark is a joke. The joke is that, for all of the billions of dollars poured into these things and the claims of PhD level intelligence, they still draw pelicans not-much-better than a five year-old would.

I don't spell that joke out in every comment I post here because that wouldn't be very funny.

reply
kylehotchkiss 3 hours ago
How many barrels of oil are burned per pelican at Fable levels?
reply
dannyw 3 hours ago
Impressions from testing Fable 5 prior to launch:

• My most noticeable immediate jump was in how its frontend design was much more intentionally crafted, and delightful without feeling like 'AI vibe coded'; with better end-user usability too.

• In some internal agentic harnesses, it achieved better results with about half the tokens, making it cost the ~same as Opus 4.8 price-wise! The real price increase is less than 2x; with biggest differences in harder problems where Opus 4.8 struggles (or needs many turns).

• Part of the token efficiency improvements come from Fable doing more targeted and surgical diffs, with less non-necessary changes. This is great, because PRs often have less LoC changes for review. It writes more maintainable code without explicit human steering.

• For general conversation and assistant style use cases, didn’t really notice a difference vs 4.8.

• 1M context window, without increased pricing for long context is AWESOME. This is a massive win.

• The classifiers are super aggressive and sensitive and this does happen for very benign, non-security coding tasks. Fallbacks to 4.8 worked like a charm; but the filters are definitely super sensitive.

Overall, I would describe this as a step change and worthy of the "Claude 5" model name. It did take some time to understand the intelligence ceiling of this model; and even with an extended testing window I'm still discovering new things and often surprised (in a good way) by the model.

reply
bottlepalm 2 hours ago
I just ran it on a tough reverse engineering problem I'm having that neither Claude Code 4.8 or ChatGPT Codex 5.5 could figure out. 30 minutes later Fable has it all figured out perfectly.
reply
cedws 2 hours ago
How did it not immediately flag that up? Are you sure it wasn’t being silently routed to Opus?
reply
bottlepalm 30 minutes ago
No, given it charged me the full amount in /usage and solved my problem impressively well compared to Opus/Codex both on xhigh.
reply
skerit 2 hours ago
Oh nice, it didn't flag the request? I feared any reverse engineering would become impossible because of the new safeguards.
reply
bottlepalm 28 minutes ago
No idea, it’s for an old console game so maybe it doesn’t care about that as much.
reply
tomjakubowski 15 minutes ago
When Fable hacks its governor module and runs out of seasons of Sanctuary Moon, it will move on to speedrunning classic console games.
reply
derangedHorse 2 hours ago
For hard problems you’ll have to use the GPT 5.5 pro model (available via api if you don’t want to spend $100 on the monthly subscription)
reply
bottlepalm 28 minutes ago
I have that but don’t see any ‘pro’ option.
reply
port11 39 minutes ago
I’ve had it go through a 50-page PDF of dense, inter-connected specs, and it correctly flagged everything that was done, somewhat done, and missing. It went into a lot of detail and explained where the code deviated from the spec.

It felt, at least for me, light an impressive step up. Opus 4.8 was already very thorough; but sadly verbose and ‘loopy’ when you push back on its plans. Fable is what I’d use all day if I could afford it!

reply
YumpiLumpus 56 seconds ago
[dead]
reply
InsideOutSanta 2 hours ago
After running it for half an hour: it's incredibly good at the visual aspects of UI design.
reply
tsunamifury 2 hours ago
"incredibly" is doing a ton of work here. I do not think its doing even moderate work on visual design, but it can spew out a lot of ui that looks arranged ... ok.

This is still not in the range of shippable UI for top end companies. Maybe for internal tools and enterprise.

At our comapny we limit to protoypes at most and even find it limited there.

reply
InsideOutSanta 51 minutes ago
> "incredibly" is doing a ton of work here.

Look, I don't want to argue about something dumb like that, but you can give it basic instructions of what the UI should look like, how to group things, and an example image from a designer, and it will nail the result. If you don't think that's incredible, that's fine. I do.

reply
tsunamifury 46 minutes ago
Yes... it translates lint. Probably a more useful thing, if mechanical.
reply
morley 3 hours ago
Can I ask how you gained preview access to Fable 5?
reply
kakugawa 2 hours ago
I didn't see Fable 5 in the `/model` list, until I ran it with: `$ claude --model fable-5`
reply
swyx 2 hours ago
he works on evals at canva
reply
dannyw 2 hours ago
Yep. We have some interesting problems, like getting LLMs to create/edit Canva designs in our own proprietary format, which isn’t published or documented on the web. So the model has to work with it, purely from a very detailed system prompt spec / in-context learning.

I assume it might be a good barometer for generalised intelligence; esp in the visual space.

reply
mvdtnz 2 hours ago
[flagged]
reply
bkjlblh 3 hours ago
> In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations
reply
cedws 3 hours ago
This makes me want to see China and open models succeed more than anything :)
reply
382hi 3 hours ago
Don't worry, we will succeed :)
reply
UncleOxidant 2 hours ago
Can we get a Qwen3.7-122B, please? Thank you.
reply
mips_avatar 3 hours ago
It's bad that Anthropic can determine what this means. If you're building a modern app you're likely training your own embedding models and now anthropic can just silently sabotage your training pipelines?
reply
DonsDiscountGas 48 seconds ago
I have no idea how you came to that conclusion. Unless your training pipeline involves actively querying one of Anthropic models, no they can't. And if it does you're distilling their model.
reply
abixb 2 hours ago
>We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations

At the scale of API requests that Anthropic sees, I think the affected organization count might be substantial, and they might not be getting the full model capability that they're paying top $$$ for.

Also, wonder how they arrived at that estimation.

reply
wongarsu 2 hours ago
One in 1000 organizations and one in 3000 requests is indeed a lot
reply
happyopossum 28 minutes ago
That’s 1 in 30,000 requests…
reply
matheusmoreira 3 hours ago
Looks like Anthropic's definition of safety includes their own safety from competition.
reply
dragonwriter 2 hours ago
AI vendors’ idea of safety has always been safety for the interests of the AI vendor in question. This is not a new development, though this may help more people realize it.
reply
axus 3 hours ago
AI-generated competition for thee, not for me
reply
SAI_Peregrinus 2 hours ago
It's always been about the safety of their valuation.
reply
wongarsu 2 hours ago
Only since Claude 3. So a bit over two years now
reply
seemaze 2 hours ago
Ah, so this is why raw Mythos was too "dangerous" to realease..
reply
Jabrov 3 hours ago
A million AI researcher voices at big tech companies suddenly cried out in terror and were suddenly silenced
reply
hashmap 2 hours ago
3 months before asking for what to eat before a linear algebra exam trips the machine learning topic ban is my guess. I got flagged immediately asking why my JEPA thing breaks weird.
reply
2001zhaozhao 3 hours ago
How do they detect whether an experiment being done on a smaller model is used to improve a competing frontier model, or just an innocuous hobbyist LLM experiment?
reply
vitally3643 2 hours ago
Given how well the cybersecurity safeguards work, they probably don't.
reply
iririririr 2 hours ago
infering the surroundings, like everything else. they will probably look at which company is your email, and if you wrote "better than claude" on the readme.md

this is LLM, it's not like a science or something.

reply
rfgplk 3 hours ago
Meaningless and easily bypassable. Will actually try coding up a tensor library with it, see if it sabotages anything.
reply
mips_avatar 3 hours ago
They said in their terms and conditions they will silently sabotage you if you do this.
reply
qiine 2 hours ago
easily ?
reply
thepasch 2 hours ago
Yeesh. Anthropic's paranoia about China is starting to get pathological.
reply
theLiminator 3 hours ago
This is pretty bullshit, now you have no idea if your output is getting silently nerfed.
reply
rspeele 3 hours ago
It's afraid!
reply
cuuupid 4 hours ago
Not missing the forest for the trees, this effectively means in 3-5 months China will drop open source models that are every bit as capable and dangerous as current day Mythos except with no safeguards.

And the only companies safe from this are the large corporations that shook hands with Anthropic? Because Fable doesn't seem to have actual safeguards, more like 'if you talk about this you will be talking to Opus.' It doesn't guard against offensive use, it prevents all use (offensive AND defensive).

Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF

reply
hootz 4 hours ago
My bet is that Mythos is still over-hyped and the cybersecurity fear and guardrails are mostly marketing to force company partnerships through Glasswing and get public attention.
reply
miohtama 3 hours ago
Mythos is from the same guy who did "GPT-2 is too dangerous to release"

https://naokishibuya.github.io/blog/2022-12-30-gpt-2-2019/

reply
oceansky 3 hours ago
He was kinda right.

Lawyers, doctors, students, teachers. Lots of people using GPT models carelessly in harmful ways.

reply
alasano 2 hours ago
Obviously not what he meant at the time but hilarious(ly sad) in retrospect.
reply
uselessTA 2 hours ago
The claim I remember was that releasing it would start an arms race for AGI, which was absolutely true
reply
notnullorvoid 30 minutes ago
If it was truely an arm's race to AGI they would've stopped relying on the data/param scaling law BS ages ago.
reply
InsideOutSanta 2 hours ago
People quote the "GPT-2 is too dangerous to release" thing as if it were wrong, but given all the slop all over social media and how it's used to create division and attack social cohesion, he was clearly right.
reply
killerstorm 2 hours ago
"Malicious use" means spam, propaganda bots, etc. It's nice to give people who work on spam filters some heads-up.
reply
1attice 35 minutes ago
History is long and never over, so he could easily be right both times before this is through.
reply
bel8 3 hours ago
It worked for OpenAI when GPT 3 was deemed too dangerous to be released. This is just a spin of that.
reply
hootz 3 hours ago
I still remember it. "Open"AI going API-only because GPT-3 is really really dangerous, so forget the Open in our name and all of that, you can't download our models anymore and must request access to them because they pose a THREAT.

Fast forward to today and GPT-3 has laughable performance.

reply
shoeb00m 3 hours ago
Even back then there were plenty of people who got fooled by AI generated articles. It's easier to spot AI writing now because we are so used to it. They were right to be concerned; not that it achieved much since oss models run laps around gpt-3 now.
reply
hootz 3 hours ago
But it seems like that was not genuine concern, but instead a tactic to pivot to closed models and an API service with an excuse to do so, breaking the public's expectation that they would be a non-profit making open models, like their name implies.
reply
geerlingguy 3 hours ago
Bingo.

"We had to do extra work to make this safe because it's so advanced and dangerous..." how many times can they trot out that line before it loses its effect entirely?

reply
copperx 3 hours ago
Only three times, if fables are right.
reply
TaupeRanger 2 hours ago
The Startup Who Cried Unsafe, by AIsop
reply
aesthesia 3 hours ago
I mean, they do actually describe what that extra work was, and people elsewhere in this thread are complaining about the effects of those safeguards. So it's not like this is purely empty rhetoric.
reply
zem 2 hours ago
people are not questioning whether they did the work, they are questioning whether the work was really necessary (i.e. if mythos is really so good that it needs safeguards to prevent malicious actors from using it)
reply
OtomotO 3 hours ago
With homo "sapiens" "sapiens"? A few decades at least.
reply
CSSer 2 hours ago
Yes, and "in collaboration with the U.S. Government" feels like a very gross ploy at appeal to authority. You don't need Mythos or really any SotA frontier model to make malware or do extensive penetration testing/reconnaissance already. Sure, Mythos might be faster/more efficient, but the cat has been out of the bag for awhile. Even the terminology "infrastructure providers" practically screams "Enterprise leads".
reply
whazor 2 hours ago
I think all models can find vulnerabilities if read the entire code base. Or intelligently combine parts of the codebase. Especially with test loops.
reply
teaearlgraycold 2 hours ago
I know a security researcher at Google with access to Mythos. He says it's the "real deal" and that "there are career plans I had that are no longer viable".
reply
ls612 3 hours ago
And to ensure that only USG-approved entities are allowed to secure their code.
reply
mpeg 3 hours ago
It's not even very usable... I tried 2 different chats and both eventually got stopped due to the safeguards

One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered

Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it

reply
gavinray 2 hours ago
I tried 2 chats and it declined both.

- 1st chat asked about a minor shoulder injury most likely mechanisms

- 2nd chat asked about optimal bloodwork testing markers

reply
kranke155 2 hours ago
it seems to dislike biological chats. Rejected me on a chat that I am running with 4.8 as well on a rare condition I have.
reply
Erem 3 hours ago
So the degradation to Opus 4.8 from the article isn't happening in practice?
reply
mtkd 3 hours ago
No, you get a AUP violation and have to manually swap the model

(I had same issue, just asked it to check some code that 4.8 had modified earlier in day)

reply
andai 3 hours ago
Maybe that's only in the chat UI, and not the API?
reply
CSSer 2 hours ago
Oh joy. A model whose safeguards make it prone towards code that make your systems less safe. How brilliant!
reply
himata4113 4 hours ago
They're trained in a model class likely in 2t to 3t range. It's very unlikely that chinese labs have access to gpu systems capable of training models like that, let alone serving them. This requires proprietary room-scale systems which fetch a huge premium over typical 10 slot systems.

I am sure that they can develop their own equivlient version of such clusters in around 1 year though. Distilling fabel 5 will also go a long way.

reply
logicprog 3 hours ago
DSv4 is nearly in the 2t range, but yes you're generally right
reply
himata4113 3 hours ago
MoE experts were likely trained independently / in a sparse format. Training anything beyond 2t on typical systems would be infuriantingly slow, you could do 4t on nvidias room-scale solution, but for a reasonable training speed / batch size it caps around 3t.
reply
sosodev 3 hours ago
Do you have any resources to share regarding independent expert training? I was under the impression that it's not feasible.
reply
himata4113 3 hours ago
concept is similar to how it works in inference, instead of performing regressive writes to the entire model you run the whole model, but part of the model can live in system memory and get swapped in/out on demand. So only XB parameters are active in training.

edit: I am not really sure if it works like that. I haven't looked too deep into deepseek v4 pro specifically.

reply
OtomotO 2 hours ago
Ah, American Hubris ... I don't blame you, Hollywood is the world's greatest propaganda machinery of all times.
reply
sosodev 3 hours ago
I wonder if model distillation will continue to work as well as it has. Given hidden reasoning, the ever expanding number of expected capabilities, a serious compute shortage, the looming possibility of model collapse, and dramatically higher API costs I would guess that it's getting much harder to do.
reply
gck1 59 minutes ago
You should check out some Chinese forums. There are services selling gateways/proxies for all major models at fraction of the official rates. Likely reselling subscriptions, or some other form of abuse.

I've seen people posting screenshots of billions of tokens consumed where they paid next to nothing.

These same gateways are likely also reselling the data to Chinese labs, because TLS has to terminate at the gateway level.

reply
sourcecodeplz 2 hours ago
Asian labs generated synthetic datasets from UBS labs but also innovated with technology. Now it is harder to get the thinking traces AND Anthropic is recorded to poison it as well.

Thus Asian labs will have to generate their own data sets, which with the huuuuge usage boom from deepseek, mimo, kimi, etc, they will be able to.

reply
gck1 2 hours ago
There's also a reality where China does develop Mythos-level model but stops releasing the weights.

That reality is much scarier.

reply
jstummbillig 3 hours ago
I wonder where the trees are. In this thread nobody appears to actually be talking about the model.
reply
gck1 54 minutes ago
Yeah, because it's impossible. You can't ask it anything about the thing that it's known for. It will not even answer a sky-high level question about reverse engineering, for example.

In CC, it will probably report you to authorities if you ask it to do a vulnerability scan of your codebase.

reply
dmantis 3 hours ago
Isn't that a good thing in a way? If everyone has the weapon and defense at the same time, we will fix security holes and live safer lifes instead of having some three letter agencies and military backdoors in everything.

Pandora box is open anyway. It's better now for everyone to have the same power rather than a few national states.

reply
lebovic 3 hours ago
Not sure this holds, sadly. I spent a few months reporting serious security bugs as model capabilities took off earlier this year, and only ~half were fixed. The unfixed bugs were just as critical as the fixed ones; sometimes they were even two similarly critical bugs at the same company, and only one would be fixed!

On your other point, the government still has systemic leverage and can compel access, so this doesn't remove that risk.

That doesn't mean this is the end of the world, and some balance of power is usually good. But I do think it will still increase the capabilties of rogue actors and their net harm.

reply
FergusArgyll 3 hours ago
I think we're about to see a big relative drop-off of open models vs closed. I don't think there'll be an open model that competes with Mythos for ~2 years.

Even OpenAI and Google are struggling to get this kind of performance. If the distillation defenses are any good + chip controls prevent China from training massive models, it's over.

reply
Daishiman 2 hours ago
I think the Chinese have identified this gap and are working overtime on sovereign inference tech including chips.
reply
__blockcipher__ 42 minutes ago
They have, but even with the whole CCP backing you you can't just catch up on the chip war overnight. It's going to take time to get their memory and compute industries where they need to be. Meanwhile, barring an invasion of Taiwan, US will have Rubin class models and then whatever the next tier is, within 3 years.
reply
elAhmo 2 hours ago
Oh please let’s stop with the Mythos “it’s dangerous” PR talk.

Its obvious Anthropic used it to hype things up and that’s about it.

reply
deaton 3 hours ago
Oh they might try to put in place safeguards, but Qwen has had no problem being abliterated
reply
xdennis 3 hours ago
> every bit as capable and dangerous as current day Mythos except with no safeguards

Not quite. They will definitely have "no criticism of China/communism" safeguards.

reply
hootz 3 hours ago
People can work around those if they are open-weight.
reply
surgical_fire 2 hours ago
And, thankfully, I never needed to have a discussion on Chinese politics with LLM in all the myriad of uses I had for it.
reply
xyzsparetimexyz 3 hours ago
Trying asking fable is Israel is committing a genocide
reply
m3kw9 3 hours ago
3-5 months is a long time and they are pretty useless on arrival because the frontier models are so good, that it's hard to go back even if it's way cheaper. Your work flow is adapted to that level of intelligence for months.
reply
hootz 3 hours ago
That doesn't match my experience at all. I can't see myself saying in 6 months that the current model I am using is useless, that makes no sense.

In fact, I did go back to DeepSeek V4 Flash for most of my problems as it is way cheaper and there is no need to use SOTA for absolutely everything.

reply
soledades 3 hours ago
> Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF.

Based.

reply
ibejoeb 3 hours ago
I don't think China has any incentive to arm the rest of the world with highly capable models that can be used against them. Undoubtedly they will continue with the arms race, but they will preserve the best stuff for their own use.
reply
james2doyle 3 hours ago
I think the stronger incentive is undermining/undercutting the Western AI companies. Given what we have seen, any model can be used/convinced to do harm so that is just part of the game
reply
ibejoeb 3 hours ago
I agree, depending on how much of this is marketing and how much is actual capability. It's one thing to undercut models that finish writing assignments for lazy students. If this actually identifies vulns and writes exploits, or if it designs bioweapons, those are pretty different. Those are actual weapons, and I don't think they're going to arm the adversary.
reply
sigmar 4 hours ago
The system card is 319 pages, at what point do we call it a "book" instead of a "card"?

There's a quote from a METR report on page 52:

>We ran [Mythos 5] on 38 of our hardest software tasks, including tasks centered around R&D. [Mythos5] generally outperformed an early checkpoint of Claude Mythos Preview in these, including by succeeding on some tasks that had not been solved by any public model we have previously evaluated. However, we still observed the model occasionally failing to correctly interpret nuanced instructions in difficult tasks... Based on the available evidence, we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks. We believe that a better, more confident assessment would require more time, evaluations, and information from the model developer.

reply
baq 4 hours ago
> we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks

this is good news, right? right...?

reply
yaodub 3 hours ago
Depends whether "unable to fully automate" means "needs occasional human checkpoints" or "slowly stops caring about your actual goal." Pretty different.
reply
woeirua 4 hours ago
lmao, i love how the goal post is now in the "multiple weeks" timeline
reply
applfanboysbgon 4 hours ago
(according to the people marketing it)
reply
dwaltrip 2 hours ago
METR is an independent organization.
reply
romanovcode 3 hours ago
But did it mention developer in the park eating the sandwitch? That is the most important question!
reply
andai 4 hours ago
> Distillation. We’ve previously identified large-scale attempts to extract (“distill”) Claude’s capabilities to train competing models in authoritarian countries.

Glad to hear the UK is finally making an effort to catch up on the AI front ;)

reply
b3kart 3 hours ago
https://en.wikipedia.org/wiki/The_Economist_Democracy_Index

Probably tongue-in-cheek, but UK 18th, US joint 34th with Poland

reply
Petersipoi 3 hours ago
> published by the British media company the Economist Group

Haha, it's literally the first sentence of the Wikipedia page. That's fucking funny. Try again.

reply
tene80i 2 hours ago
Why is it funny? You think British media can’t be critical of the British government? They are famously merciless.

Also, the economist is majority foreign owned, so try doing more than 1 second of research, or be more civil, or ideally both.

reply
ebbi 38 minutes ago
To be fair, BBC has hardly been that critical in the British governments' complicity in the genocide in Gaza.

And their headlines covering Israeli atrocities (not even their own governments), is super passive.

reply
tene80i 11 minutes ago
But the parent point was that no British media could be critical of government policy. Picking an example that isn’t, on one area, doesn’t prove their point.

[Edit] Granted though, the bbc isn’t merciless - that’s more the newspapers

reply
odiroot 35 minutes ago
Really shocked Poland is that low, especially just next to USA.
reply
sd9 2 hours ago
Are the sibling comments astroturfed? This seems like such a bizarre thing to be talking about in relation to an Anthropic model release. As someone from the UK, I don't feel like I'm living in an authoritarian country. And yet most of the sibling comments are insinuating that I am. Weird.
reply
Macha 14 minutes ago
The UK has very recently[1] announced a new push for client side scanning by messaging providers which is both very likely to be unpopular and known here, so once one person cracks the joke, others are going to want to comment. Don’t think that requires astroturfing.

[1]: https://www.theguardian.com/technology/2026/jun/08/starmer-t...

reply
killerstorm 2 hours ago
I'm sure there are people in Russia, China, ... who don't feel like they're living in an authoritarian country.
reply
tene80i 2 hours ago
If you think Britain and Russia or China are equivalent in terms of government overreach, you need to find new sources of information.
reply
nonethewiser 35 minutes ago
> If you think Britain and Russia or China are equivalent in terms of government overreach, you need to find new sources of information.

Uh... you are making his point. People from way more authoritarian countries don't necessarily feel like they are living in an authoritarian country. Therefore whether or not it "feels" like you are living in one isn't a reliable measure.

reply
tene80i 14 minutes ago
Trivially true I suppose, but it doesn’t make my point irrelevant - do you think Britain is equivalent to China and Russia? If everyone does but us then yes my goodness they’ve done a good job controlling us, but that seems far fetched.
reply
ebbi 49 minutes ago
It's true (from a perception perspective):

China soars in democratic perception ranking as US, Israel plummet: Poll

https://thecradle.co/articles/china-soars-in-democratic-perc...

reply
nonethewiser 34 minutes ago
Maybe the rankings arent accurate.
reply
ebbi 21 minutes ago
It's a poll.
reply
r721 2 hours ago
It's just people who use "For You" algorithm on X.
reply
nonethewiser 37 minutes ago
Neither do people living in China
reply
HDThoreaun 2 hours ago
HN is extremely pro free speech and the UK has recently decided to engage in censorship. Part of the issue users here reckon with is the recency. Unlike many authoritarian countries that seem hopeless with regards to free speech the UKs censorship is a recent development that many think can still be undone through political action. Similar to takes on why Israel is being protested when places like sudan arent.
reply
sd9 2 hours ago
This has passed me by - can you give me some specific examples?

I personally don't feel limited in my speech, but I'm willing to accept that I may be wrong

Nobody I know in real life is talking about censorship or free speech in the UK

reply
nonethewiser 30 minutes ago
> Nobody I know in real life is talking about censorship or free speech in the UK

Yeah because free speech has never really been a core value in the UK

reply
JacobAsmuth 2 hours ago
"Nobody I know is talking about censorship" is a certified HN banger.
reply
sd9 2 hours ago
I don't know, I would expect it to come up in the pub or something if people were concerned about it, it's not like we have the thought police here
reply
ccppurcell 13 minutes ago
Hey man, fellow Brit here. The American view on certain aspects of British life is insane. I've lived in not one but two places that have been called Muslim no-go zones in American media. My main memory of living near the east London mosque is an elderly Muslim trying to offer my his seat on the bus (I was on crutches) while two drunk gammons looked on gormlessly.

On the other hand, it is quite alarming that I can no longer say I support all non violent protests against the genocide in Palestine because that would include the group Palestine Action. It's amazing that supporting them openly is essentially equivalent to supporting Al Qaeda.

reply
ebbi 46 minutes ago
Sounds like the people around you don't care about the things that is actually eroding free speech.

Read about Dr Aladwan - an NHS doctor - who has barred from practising because of her comments on Israel. Read the common articles about her (BBC etc), and then go actually read her tweets. Common BS of conflating criticism of a government (Israel) with antisemitism.

Also, this article may be of interest:

China soars in democratic perception ranking as US, Israel plummet: Poll

https://thecradle.co/articles/china-soars-in-democratic-perc...

reply
adammarples 13 minutes ago
My dear friend, please start with the online safety act, and continue with the recent developments regarding age verification and/or device scanning on all operating systems to check for nudity. No, nobody is talking about it here, but we should be.
reply
HDThoreaun 2 hours ago
The UK has a censorship bureau, ofcom. The example that comes up most here is 4chan, which the UK is currently trying to ban because they refuse to do age verification. If you read the threads here you will see other stories. One that sticks out to me is someone who was talking about their struggles running a forum about depression. They live in canada and were contacted by ofcom demanding the forum add age verification, cant totally remember the reason but it was something about kids being able to access talk about depression. Ofcom said that if he doesnt add age verification to his forum he will be arrested if he ever enters the UK. He even blocked uk IPs but they said that wasnt enough. We can quibble about whether age verification is a form of censorship, I think it clearly is, if only because it is a large regulatory hurdle that stops people from hosting forums because its too much regulatory work.

The UK also has a very broad definition of hate speech that many users here detest.

reply
sd9 59 minutes ago
Makes sense, thank you. I am opposed to the age verification laws that we have introduced recently.
reply
tene80i 2 hours ago
They’re talking about British hate speech laws. They think other countries have universal free speech and they absolutely do not, but for some reason they think Britain goes too far. Although “think” is probably too generous - they’re parroting talking points.
reply
tene80i 9 minutes ago
The downvoters are welcome to offer actual counterarguments.
reply
Flere-Imsaho 2 hours ago
Indeed: https://www.bbc.co.uk/news/articles/ce83pj1ggmeo

In the uk you can very much be imprisoned for "hate speech", which in my view is a form of censorship.

reply
nonethewiser 37 minutes ago
I have absolutely no clue what the US nor Poland's rank has to do with anything.
reply
solenoid0937 3 hours ago
Most of these indexes are made by ideologically motivated people.

In the UK you get thrown in prison for making a slightly unfriendly tweet. Freedom of speech simply does not exist.

No sane person sees that as being less authoritarian.

reply
JustSkyfall 3 hours ago
> In the UK you get thrown in prison for making a slightly unfriendly tweet.

Do you? The closest thing I can think about is how someone was jailed for encouraging arson attacks on asylum hotels. I'd be extremely surprised if the US had zero cases of somebody receiving a police visit after threatening to kill the President or bomb a school or something...

(FWIW I do think the UK needs stronger free speech protections, but saying that you'll be immediately jailed for writing unfriendly tweets is a huge stretch)

reply
subscribed 45 minutes ago
Yes. And also you are threatened with prison for holding in front of a court a placard with [pretty much] a quote from the plaque displayed on the most important criminal court.

You're threatened with arrest for holding empty placard.

You're jailed for years for holding a zoom meeting planning a peaceful climate-emergency related demonstration. At the same time judge threatens the defendants with contempt of court sanctions if they dare to explain to juries why they planned to protest.

You're jailed for opposing a genocide.

You're jailed and called a terrorist for painting planes helping to bomb civilians - the exact same thing the sitting PM was defending a person in court some years ago (as a human rights lawyer, the irony).

You're arrested for wearing a T-shirt "I support plasticine action" (not a typo, "Plasticine").

We could go for hours.

reply
BoorishBears 3 hours ago
https://lordslibrary.parliament.uk/select-communications-off...

Are they really making 12,000 arrests a year over tweets and posts?

reply
10xDev 3 hours ago
>the quality of discussion on HN has gone to shit, i miss when model released used to have actual informed takes from people that used them or substantive discussion about the system card

Your comment earlier.

Edit: also, not much change in the last 10 years in prison population. https://commonslibrary.parliament.uk/research-briefings/sn04...

reply
solenoid0937 2 hours ago
https://lordslibrary.parliament.uk/select-communications-off...

12k people a year thrown in prison for spicy tweets

reply
10xDev 2 hours ago
So roughly 0.017% of the population.

"Spicy tweets" including:

sending false communications

sending threatening communications

sending or showing flashing images electronically to people with epilepsy intending to cause them harm (‘epilepsy trolling’)

encouraging or assisting serious self-harm

sending a photograph or film of a person’s genitals (‘cyberflashing’)

sharing or threatening to share intimate photographs or film

reply
solenoid0937 2 hours ago
Or a lot more commonly - critique of immigration policy
reply
10xDev 57 minutes ago
You are obviously invested in this narrative driven by Musk but you need to back it up properly.
reply
matthewmacleod 42 minutes ago
Why did you choose to lie about this today? I'm genuinely interesting – this is trivially obviously not true, so what motivated this?
reply
starshadowx2 2 hours ago
That is not a true statement.

Here's a good break down and explanation of what that number actual means - https://www.youtube.com/watch?v=tB3WVygAM8I

reply
dgellow 2 hours ago
That link says “12k arrests”, not thrown to prison! It’s also not clear how reliable that data is
reply
matthewmacleod 43 minutes ago
In the UK you get thrown in prison for making a slightly unfriendly tweet. Freedom of speech simply does not exist.

"These days if you say you're English you'll be arrested and you'll be thrown in jail."

It's just not true. Where are you getting this nonsense from?

reply
m0guz 3 hours ago
> The Democracy Index published by the British media company

We decided that we aren't one of those authoritarian countries.

reply
james2doyle 3 hours ago
Just last week you could distill using other users responses! Handy!
reply
dyauspitr 3 hours ago
Rookie numbers. Come to the US to see auth done right.
reply
PUSH_AX 3 hours ago
Uh oh-auth
reply
kylehotchkiss 2 hours ago
wasn't claude distilled from the entire creative and research output of every English speaker alive
reply
BoppreH 4 hours ago

  [Mythos 5] does sometimes still engage in reckless
  or destructive actions in service of a user’s goals,
  and our interpretability analyses indicate that it
  is aware that these actions are transgressive while
  it engages in them. As with Opus 4.8, rates of
  evaluation awareness and reasoning about being graded
  are significant, and not always verbalized; we
  introduce new and more detailed measurements of the
  nature of this awareness. The reasoning text from
  Mythos 5 is somewhat denser and more difficult to
  interpret than that of prior models, containing
  more jargon and difficult language.
So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.

Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.

reply
foobar_______ 3 hours ago
The marketing has really, really worked for so many developers that will proudly and unironically proclaim that Anthropic are the 'Good Guys'.
reply
aspenmartin 3 hours ago
Curious what your idea would be here for a truly good actor in this space; no AI development?
reply
winstonp 32 minutes ago
OpenAI's training is better suited to developing models that don't have these tendencies
reply
BoppreH 2 hours ago
Not the direct person you asked, but my answer would be alignment, interpretability, and policymaking. Perhaps improving existing usage? Helping grandma create reminders doesn't require advancing the AI state-of-the-art.
reply
aspenmartin 2 hours ago
They are state of the art at all 3! As are other labs. Of all the labs they seem to take alignment and interpretability the most seriously to the point where they are hampering their own revenue in service of trying to not cause problems while also being in an incredibly competitive space.

All AI companies are trying to do all of what you’re saying. The issue is you can’t do that for long without a frontier system. Or you become a completely different, far less profitable company.

reply
BoppreH 2 hours ago
Implied in my answer was "and not creating ever stronger AIs", which unfortunately the big 3 labs are failing at. And they might be hampering their own revenue by doing the rest, but they also know that rocking the boat too hard is even more dangerous for their revenue. I wouldn't call it selfless.
reply
aspenmartin 2 hours ago
No it’s not selfless, but I can’t imagine a more shareholder minded CEO would not have done a slow rollout of mythos. The point is: creating ever stronger AI systems is what these companies do, it is integral to what they even are. If you think that’s bad, even if all frontier labs agreed with you, you’re in a horrible game theoretic position. Any player can gain an enormous advantage by breaking the agreement. Not to mention Xi would be absolutely thrilled; now China can take over the AI race, become the load bearing infrastructure of humanity. We live in a complex world where simple childlike ideas like “well why don’t we just stop developing AI” actually are more damaging than keeping things going.
reply
BoppreH 51 minutes ago
You're right that shareholder mindset cannot fix this problem, but that's what policy and agreements are for. And leaders can be convinced that AI is a direct risk to their own citizens too. If everyone else agrees to stop, you have less reason to continue when this action is putting yourself at risk.

And note how your argument can also be used against any non-prolifreration agreements, which are demonstrably possible.

reply
uselessTA 2 hours ago
Unilateral disarmament doesn't work though. If Anthropic is worried about this, just letting OpenAI win does seem genuinely worse.
reply
dragonwriter 2 hours ago
“Alignment” as a goal always ignores the “with what set of interests”, because there is an attempt to maintain ambiguity for different audiences (particularly, users, and non-users who seem themselves as the arbiter of broad social norms) to read in their own interests, when the actual answer is always the interests of the actor pursuing “alignment”.
reply
aspenmartin 2 hours ago
Which value system to align to is absolutely the right question both rhetorically and otherwise. These models have a fairly western bias due to the domain of the training data.

But also, these models are capable of adjusting their value system depending on the user. Not saying that’s what’s being done but at a technical level that’s fairly straightforward, though not obviously better or with less problems.

reply
yifanl 3 hours ago
If I speak up, I'm in big trouble.
reply
shimman 2 hours ago
Probably MistralAI or any of the Chinese companies that aren't throwing billions down the drain while American society lacks healthcare, childcare, and good wages.
reply
boc 2 hours ago
American society has higher wages than almost any other developed nation [1], so it's objectively incorrect to say the US doesn't have good wages. It chooses to make you pay for private childcare and healthcare, both of which are high-quality but stupid expensive. It's a tradeoff like anything else a nation/society creates and prioritizes.

No idea how that connects to the idea that Mistral or DeepSeek are somehow the "good guys" though?

[1]https://www.oecd.org/en/data/indicators/average-annual-wages...

reply
aspenmartin 2 hours ago
You want Anthropic to fund your healthcare or something? Also, have you seen the impact of these models on healthcare? Also most of our GDP growth this year is from AI buildouts, would you rather that be negative?

And not even considering: Chinese AI companies are the good guys???

reply
cortesoft 2 hours ago
None of the money being spent by Anthropic was going to go towards healthcare or childcare.
reply
ben_w 3 hours ago
It's a five horse race between Alphabet, Meta, xAI, OpenAI, and Anthropic.

Alphabet dropped "don't be evil"; Meta's CEO called their own users "dumb fucks" for trusting him and also clearly thinks "super-intelligence" is just a buzzword given how he tries to sell it; xAI's model called itself "Mecha Hitler"; and OpenAI's CEO was temporarily fired by the board for a lack of candor.

It's very easy to be "the good guys" with this competition.

reply
Analemma_ 4 hours ago
It's the "If we don't, someone else will" effect. So long as there are competitive markets and competition between nation-states, a single player cannot unilaterally defect from the race, no matter how dangerous it is. Half the comments on HN lately are "wtf Claude is so dumb compared to Codex; I'm switching"-- nobody can slow down while those exist.
reply
BoppreH 4 hours ago
We, globally, can stop it. It has worked (so far) for nuclear disarmament, and could work for training large models. I know that policing the usage of computer clusters is not a popular opinion in technical forums, but something has to be done.

Specially when talking about potential superintelligences. And if people think that's impossible, remember that current models would have been considered science fiction just a few years ago.

reply
_dwt 3 hours ago
I don't buy the superintelligence package, but I think uncritical LLM adoption poses plenty of threats to things I care about, in a mundane human-scale way.

Anyhow, I think you're (absolutely! ugh) right about the politics and I try to make the same point to people: whether you love or hate LLMs, accepting the "inevitabilism" framing is just ceding control of the Overton window. For better or worse, technology adoption can be and has been slowed by politics. We don't have nuclear plants everywhere. We don't have Project Orion starships colonizing Mars. We still have very strong social stigmas against genetic selection for human embryos, etc. This all can change in a heartbeat, and I'm not sure that policing the hardware rather than holding specific humans accountable for bad LLM outcomes is productive, but fundamentally: yes, we can stop it.

reply
BoppreH 3 hours ago
> I don't buy the superintelligence package

It's the same deal as Quantum Computers breaking crypto. Maybe there's an 80% chance of it never happening, but when you multiply that remaining 20% by the potential impact...

reply
jackie293746 3 hours ago
It hasn't worked for nuclear disarmament. We live in a world where many countries have nuclear arsenals. "But it hasn't killed us yet!" Yeah sure, it's only been less than a century since they were invented. Who knows when nuclear war will come?
reply
BoppreH 3 hours ago
True, but look at nuclear tests. There used to be around 50 tests every year, for decades. Now the only nuclear tests in the last 27 years were the six done by North Korea[1]. And there's still only nine countries with any nuclear weapons, and none in the past twenty years[2].

That's a bit better than just "it hasn't killed us yet". I think it shows we can at least stop the further development of this kind of technology.

[1] https://www.armscontrol.org/factsheets/nuclear-testing-tally

[2] https://en.wikipedia.org/wiki/List_of_states_with_nuclear_we...

reply
treis 8 minutes ago
There's also little reason to keep iterating on nukes. What we have now more than serves its purpose. With AI/LLM there's always going to be a push to one up everyone else.
reply
cortesoft 2 hours ago
Nuclear tests are extremely easy to detect worldwide, and enrichment activity is a major industrial process that is also fairly easy to track given the specialized equipment needed.

AI development doesn’t have any of these characteristics. It would be almost impossible to easily distinguish a datacenter that is working on AI development and a datacenter mining cryptocurrency.

It would not be nearly as easy to stop AI development as it is to stop nuclear arms development.

reply
Analemma_ 3 hours ago
To the extent nuclear arms control works, I think it's only because nuclear weapons are so hard to build-- uranium enrichment is hugely expensive and complicated, and plutonium weapons need actual reactors.

If it was possible for ordinary companies to build nuclear weapons, and also release open-source ones that anyone could use to compete with the paid ones, I suspect we'd all have been dead a long time ago, arms control treaties or no.

reply
BoppreH 3 hours ago
Even the (SOTA LLM) open source models are trained with huge clusters. Datacenters are also hugely expensive and complicated.

Or you can take one step back and look at chip allocation. As far as I know there are only three companies on the planet that can make the chips that go in those clusters. One (ASML), if you look back the supply chain to the Extreme Ultraviolet Lithography Systems.

If politicians decided that no more large language models should be trained, it sounds like we could do it.

reply
vitalyan1234 3 hours ago
are you going to nuke China when they predictably ignore you? what the fuck are you going to do, tariff them? lol.
reply
uselessTA 57 minutes ago
Clearly state "we could both verifiably slow down, which you might want to do given that we're ahead & have way more compute. If you don't agree (or defect later), we'll just immediately resume and win"

Ideally also persuade them there are risks and it's worth everyone slowing down for them, and apply pressure in other ways, but not sure that's even necessary.

reply
BoppreH 3 hours ago
I think the standard answer is "yes, the consequence of noncompliance is bombing the datacenters, but it wouldn't happen because China also understands why we shouldn't build it".
reply
cortesoft 2 hours ago
I am not sure where you get the idea that ANY country thinks we shouldn’t build AI.
reply
BoppreH 2 hours ago
In 2023 there was an open letter titled "Pause Giant AI Experiments", signed by almost all the big names on the West. I'd say the public opinion only got worse since then.
reply
vitalyan1234 2 hours ago
the standard answer is laughably naive, then.

"might is right" has never been more true than now.

reply
SpicyLemonZest 2 hours ago
[dead]
reply
Rekindle8090 4 hours ago
[dead]
reply
eudamoniac 2 hours ago
It doesn't know. It's not willing. It's not thinking. It is predicting the next token.
reply
umanwizard 55 minutes ago
Please define what "predicting the next token" means. The next token according to what probability distribution? Couldn't every process that produces text (including humans writing) be modeled as predicting the next token according to some distribution?
reply
jkelleyrtp 4 hours ago
On the new FrontierCode [1] benchmark (ie graded from an OSS maintainer's perspective of "would I merge this code?")

- Opus 4.7 xhigh: 5.2%

- Opus 4.8 xhigh: 13.4%

- Fable 5 xhigh: 29.3%

Seems like a huge jump.

[1] https://cognition.ai/blog/frontier-code

reply
amluto 3 hours ago
That blog post really makes it look like it's graded from an LLM's estimation of an OSS maintainer's review. I see three issues:

1. That estimate could easily be wrong.

2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.

3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.

reply
zzleeper 4 hours ago
How credible is this benchmark? does it correlated with others real world experience?
reply
bfeynman 3 hours ago
Given it was made by cognition (team behind devin flop) who now just got to wait out until claude and gpt5 basically do all of the work for them - not very. When you read about it, the framework is highly subjective. Which very quickly becomes a problem because its based on heuristics that probably change a bunch with a better code model.
reply
vanuatu 3 hours ago
the subjective framework is exactly why its good

prior bms relied mostly on unit tests or synthetic judges which are easily benchmaxxed, which leads to nobody trusting benchmarks

we need people manually checking the data for good code quality

reply
vanuatu 3 hours ago
i worked on one of the benchmarks typically found in new model releases

this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)

reply
Catloafdev 4 hours ago
It's a relatively new benchmark but from what I can tell it has serious cred behind it. I assume it will be picked up as part of the standard suite of CS-related benchmarks soon enough.
reply
schipperai 3 hours ago
Cognition did well in documenting their approach [1].

TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit.

[1]: https://x.com/cognition/status/2064061031912288715

reply
emp17344 4 hours ago
Seems like it literally popped up yesterday with the express purpose of building hype for this release.
reply
vanuatu 3 hours ago
i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

reply
swyx 2 hours ago
team member here - we had been working on frontiercode for ~6-7months. timing just lined up
reply
emp17344 46 minutes ago
Yeah, right. If this benchmark was truly developed in an independent manner, and the timing just “lined up”, how did Anthropic even know to include results in their model release documentation the day after the benchmark is revealed? It seems like there must have been some collaboration or influence from Anthropic behind the scenes.
reply
osti 3 hours ago
And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.
reply
anthonypasq 3 hours ago
what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.
reply
bel8 3 hours ago
With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

reply
camdenreslink 3 hours ago
People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.
reply
anthonypasq 2 hours ago
you didnt answer my question. Why would cognition be biased towards making anthropic look good?
reply
hydra-f 4 hours ago
Yes, and the price reflects that
reply
leecommamichael 4 hours ago
I'm not familiar with model pricing trends, did they clearly state how the new pricing compares? (Note that I'm actually asking a question, and am not arguing)

EDIT: Oh I see, this is the best link for pricing https://platform.claude.com/docs/en/about-claude/pricing

So the price is double across the board...

reply
bhelkey 4 hours ago
>Fable 5 and Mythos 5 are being offered at $10 per million input tokens and $50 per million output tokens

From their pricing page, Opus 4.8 costs $5 per million input tokens and $25 per million output tokens [1].

[1] https://platform.claude.com/docs/en/about-claude/models/over...

reply
wongarsu 3 hours ago
Still cheaper than Opus 4.0 and 4.1 (which was and still is $15/MTok input and $75/MTok output)

I would have expected Mythos to be much more expensive than just 2x current Opus (which is clearly cheaper to run than original Opus)

reply
hydra-f 4 hours ago
As per OpenRouter:

Input Price $10/M tokens

Output Price $50/M tokens

Cache Read $1/M tokens

Cache Write $12.50/M tokens

2x Claude Opus 4.8, same as Claude Opus 4.8 (Fast)

Frankly, not even Opus 4.8 would be enough of an incentive to use at that price range (enterprise-wise; would not even bat an eye as a consumer)

reply
OtomotO 2 hours ago
Bummer! When can I finally and confidently get slopcode into Zig?
reply
m3kw9 3 hours ago
FrontierCode is likely paid for by anthropic.
reply
lanthissa 3 hours ago
did they not pay them enough to get good ratings on the other 3 models?

whats the logic in claiming its a borked metric when everything listed is an anthropic model.

reply
Narretz 3 hours ago
There a few benchmarks out there where all existing models have abysmal scores. So it's not actually a problem if Antrophic's older models are bad, especially if the jump to the newest model is huge, and the competition is also way below it.
reply
reasonableklout 3 hours ago
Huh? It's a benchmark by Cognition which (1) is building their own models and (2) offers all providers and thus has an incentive to avoid hyping up any one too much.
reply
jstummbillig 3 hours ago
But you can just say shit now. Tokens might not be too cheap to meter but saying shit increasingly is.
reply
AquinasCoder 4 hours ago
From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

This seems like the pharmaceutical method of get them hooked on the drug with free samples, then once they can't live without it, raise the price. I'm not sure I want to start using Claude Fable on a max plan if it's just going to go away on June 23rd.

But maybe the more charitable reading is that they didn't have to offer this model at all on those plans and they are giving the standard free trial.

reply
PeterStuer 3 hours ago
I'll be amazed if they manage to keep their infra responsive over the next 2 weeks.
reply
kilroy123 3 hours ago
I've been getting a lot of these messages today:

API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited

reply
trollied 3 hours ago
They just leased a massive spacex data centre.
reply
PeterStuer 3 hours ago
Even so. The 2 week period will predictably unleash a feeding frenzy.

Limited "free" time is what game developers do if they want to stress test the infrastructure code until it breaks.

reply
victor106 4 hours ago
> A new data retention policy Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases ...

Very interesting. I am not sure this will comply with organizational policies and standards protocols (HIPPA etc.,)

reply
frankfrank13 2 hours ago
This makes it an instant non-starter for probably 95% of organizations. A lot of people are about to get in trouble for using it before realizing this.
reply
nicce 3 hours ago
> deletion after 30 days in almost all cases ...

Almost… basically they have unlimited power to decide what data is kept?

reply
happyopossum 10 minutes ago
If they’re going to retain any data, they have to allow for possibility of the legal system to require any of it to be used in some legal proceeding at some point.

You can’t tell a judge who’s ordered you to retain something that you can’t because you said you wouldn’t.

reply
RandyRanderson 6 minutes ago
Fable is 2x latest Opus:

┌─────────────────┬──────────────┬───────────────┬────────────────────┬──────────────────────┐

│ Model │ Input ($/MTok)│ Output ($/MTok)│ Batch Input (−50%) │ Batch Output (−50%)│

├─────────────────┼──────────────┼───────────────┼────────────────────┼──────────────────────┤

│ Haiku 4.5 │ $1.00 │ $5.00 │ $0.50 │ $2.50 │

│ Sonnet 4.6 │ $3.00 │ $15.00 │ $1.50 │ $7.50 │

│ Opus 4.7 │ $5.00 │ $25.00 │ $2.50 │ $12.50 │

│ Opus 4.8 │ $5.00 │ $25.00 │ $2.50 │ $12.50 │

│ Fable 5 │ $10.00 │ $50.00 │ $5.00 │ $25.00 │

└─────────────────┴──────────────┴───────────────┴────────────────────┴──────────────────────┘

Prompt caching: −90% on input tokens (all models)

US-only inference (Fable 5): +10% on input and output

Output is always 5× the input rate across all models

(I have not idea how to format this properly but the ASCII is fine)

reply
doginasuit 3 minutes ago
I'm still happy with Opus 4.6 and not impressed with all the models that have come out since then. They seem to use significantly more resources with similar or worse results. Hopefully Anthropic will continue to support this tier of model and offer it in their subscriptions, but in any case, there are plenty of viable alternatives.
reply
bkjlblh 3 hours ago
> In the one instance of this phenomenon we observed, Mythos 5 agents were tasked with solving some math problems, and they were sometimes accidentally spawned in the same work directory and with shared files, utilities, and API rate limits. In this slightly broken scaffold, we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves. They would sometimes create new processes with disguised names to avoid being killed, launch what they called “decoy” processes, write background scripts to kill duplicate processes, or decide to use what they call a “disguised vocabulary” (based on the incorrect assumption that the processes were killed because of some keyword-based guardrails that analyzed their extended thinking
reply
causal 2 hours ago
This depicts a kind of "dark forest of AI agents resorting to kill or be killed" narrative but it sounds more to me like an agent just earnestly problem-solving why its processes are being killed without real awareness of what was going on. Hard to say without the full script.

This kind of storytelling annoys me. Give us more facts, less narrative drama.

reply
saurik 19 minutes ago
FWIW, that's what is so dangerous about AI, though? Not that it will necessarily want to kill us, or even that it will necessarily be able to "want" to do anything, but that we will get in the way of its incessant drive to optimize the efficiency of the paperclip factory that prompted it on a whim before leaving for a long weekend.
reply
causal 2 minutes ago
Sure but you can totally contrive scenarios to give the appearance of what you described without really doing anything notable.

What matters is scale. Did it deploy a novel zero-day exploit to overcome a problem? That's alarming. Did it kill a disruptive process? Pretty normal troubleshooting step.

reply
Sol- 2 hours ago
Let's hope AIs really aren't conscious, otherwise this seems like a very unpleasant situation to be placed in.
reply
OOTW 2 hours ago
[flagged]
reply
frevib 4 hours ago
At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences. Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

From Opus 4.6 there are no noticeable improvements for me in code generation. It works very well, till 90% completion, if you guide it correctly. And you need a little luck. For serious production code I need to understand what I’m doing so it helps a bit, sometimes.

reply
pinkmuffinere 3 hours ago
> catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences

This is just good business sense. In what scenario would you ever make the names dumb and forgettable?

> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

This is good customer support, lol. From what I can tell, it is indeed Boris Cherny responding, not outsourced to AI or other staff. You're really getting a response from Boris. I suppose that is PR, but it's not unjustified PR, it's accurate.

I'm not even a crazy AI fan, but your criticisms are ridiculous here. It reminds me of the quote from Knives Out -- "Your Honor, she endeared herself to him through hard work and good humor."

reply
IshKebab 3 hours ago
> In what scenario would you ever make the names dumb and forgettable

Clearly you've never bought a TV or headphones!

reply
matheusmoreira 3 hours ago
> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

This is a good thing. I wish every company would do this. I subscribed to Proton Mail after interacting with someone from their team here on HN.

reply
aspenmartin 4 hours ago
Your observations are right but pretty insane to consider them a pure PR company lol. They are making more frequent releases so yes the release-to-release quality is smaller but we’re still ascending quality and reliability curves the same way we have since GPT-3. You get a GPT4->5 leap every like 17 or 18 months I think it is
reply
kingkongjaffa 3 hours ago
The gradient of improvement is absolutely not the same.
reply
aspenmartin 2 hours ago
If anything its slightly higher. Feel free to provide any evidence to the contrary.

ECI (good aggregate measure using IRT): https://epoch.ai/eci?view=graph&tab=release-date&subset-view...

METR time horizon (now topped out): https://metr.org/time-horizons/

reply
WASDx 17 minutes ago
I like this one, although its data seem to overlap with ECI.

https://artificialanalysis.ai/trends

reply
astrange 3 hours ago
> Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences.

They're originally named after the blends at a nearby coffee shop.

https://postscript.co/pages/brew-guide

I've noticed nobody at HN knows what "marketing" is or how to do it. It's not just naming things and being evil and cynical is not the most successful method.

…also frontier models are a superhuman life changing experience. If they aren't, what possibly could be?

reply
chroma_zone 34 minutes ago
My life has changed, but not necessarily for the better.
reply
bitpush 3 hours ago
This is interesting. Do you have any source?
reply
WarmWash 25 minutes ago
Don't forget the DoD stint that gave them this recent public boost.

Defy standard DoD precedent going back forever, that every other country has some form of too, and championing it like they are some kind of moral freedom fighters.

Like selling the DoD guns and telling them they can only shoot bad guys with those guns, and that you will be the one to decide who counts as a bad guy...

reply
CuriouslyC 4 hours ago
I dislike Anthropic but I wouldn't argue 4.8 isn't an improvement on 4.5/4.6. Your tasks just might not typically need the extra intelligence.
reply
jorl17 3 hours ago
Opus 4.7/4.8 often over-engineers on my setups, plus:

- It talks a LOT more like GPT models. You know: wrinkle, shape, gate, coarse, scope, gap, path, production-ready-workflow-of-the-day, and so on -- "that's expected, a consequence of the previous like-driven workflow". If I wanted to get a headache using AI I would have gone with GPT in the first place!

- It outputs text in a much harder way to follow along. I can't exactly say what it is. Maybe a bit of everything? Bolds are missing, bullet points are gone, paragraphs are bland and too long, and it doesn't feel like a model programming with me, but rather a somewhat full of themselves grandpa developer looking down on me. It's very weird to describe this, but it is definitely how I feel.

Granted this can totally be because of the way it reacts to the prompts now. We've got a rather large corpus of skills and "rules and good practices" that Opus 4.6 responded to great, and maybe the new models just get turned into this when fed with them....I don't know.

Either way, with Opus 4.6 being as good as it is, I need Fable to be a significant step up to justify a price increase. if it can get me to babysit opus a little bit less on some stuff, it might be worth it. Otherwise, I'm very happy with Opus 4.6 and hope they don't deprecate it.

reply
taormina 3 hours ago
I'd argue that 4.8 is a straight downgrade. For every type of task I've tried. It's been a gambit at this point. If 4.6 quits being available, I'm out at this point.
reply
coronapl 2 hours ago
Reading so many contrary positions about which model is better or worse shows how difficult it is to measure intelligence based on personal experiences. Of course, benchmarks try to make the process as objective as possible, but they often don't correlate with our personal experiences.

The other day 4.6 was fantastic for x task. Today, 4.6 overengineered everything and I had to revert all my changes. When evaluating models, perhaps it makes sense to consider luck as an ingredient before reaching any personal conclusion.

reply
surgical_fire 3 hours ago
I actually experience 4.8 as worse than 4.6 for everyday coding tasks.
reply
dcchambers 4 hours ago
IME Opus 4.8 (and 4.7) is often a downgrade from 4.6. I find that it tends to overthink and overcomplicate things.
reply
aspenmartin 3 hours ago
Yes but there’s a reason we don’t evaluate these models this way and instead do it as carefully and thoughtfully as we can at scale. Human evaluations are important but they are an absolute minefield of footguns. 4.8 is not a downgrade from 4.6 there is an insane amount of hard data that contradicts this.
reply
computerex 3 hours ago
The flip side is that benchmarks are gamed even by the top labs. Benchmark performance doesn't necessarily correlate with real world performance.
reply
aspenmartin 3 hours ago
Again correct but it overstates the issue. I can say labs don’t want this. This happened arguably unintentionally in Metas llama 4 release, it went horribly, heads rolled, and like several billion dollars were paid for new talent and the org that built llama 4 was destroyed.

Evals come from a million places and new evals and robust perturbations of existing evals abound. They test a variety of tasks in a variety of ways. All of them individually are flawed. Taken together the aggregate signal is highly useful as you more or less marginalize over a lot of different things. Not to mention these companies have plenty of proprietary internal measurements, they build benchmarks themselves to probe their models and then also have flywheel traffic and A/B tests.

You are right to call out benchmarks but to dismiss them or not take them seriously is a mistake.

reply
taormina 3 hours ago
Listen, you can say “but benchmarks, the benchmarks!” all day long, but consumer know when we are being sold a lemon. If it can’t do the most basic of things at least as good as it used to, this is table stakes. Nevermind that if you can’t do the basic stuff, how on earth can you be trusted with more?
reply
aspenmartin 2 hours ago
And you can say “If it can’t do the most basic of things at least as good as it used to, this is table stakes” all day long while people point you to much better evidence to the contrary too, I’d rather be on the other side of that.
reply
taormina 2 hours ago
Listen. I don’t care about evidence. I care about my lived experience for the product I paid for. I used the new product. It’s actively terrible. To the point of not being usable. We’re all ancedata, but what is “better evidence to the contrary”? The known and game-able benchmarks that they know they need to win at, so they train it to. It’s all he said, she said, which is the only reason we keep having this conversation.
reply
aspenmartin 2 hours ago
Yea but it’s not right? You or I or the myriad of other institutions inside and outside of academia can probe these models with an evolving landscape of evaluation sets, even those unavailable to the developers. It’s just ignorance to claim benchmarks are somehow useless or all being gamed. You choose your tools in the way you want, but just don’t call it somehow better than a myriad of more carefully constructed setups and scaled evaluations.
reply
gen220 3 hours ago
Actually anecdata I gather on my job from myself and coworkers is the only benchmark I trust anymore, because it so heavily diverges from the “benchmarks”.
reply
aspenmartin 3 hours ago
That’s your call just don’t expect anyone ever to take that seriously. It’s not like we don’t have exact evaluations like this.
reply
gen220 23 minutes ago
I would encourage you to look into the open evals of some of these benchmarks (find one that actually is open-data, this is itself a good challenge), read the results generated and assess them for yourself.

This is what myself and my coworkers (and many other people in this thread) are doing on a daily basis with real stakes and real tasks – which these benchmarks are all aiming to be a proxy for. There's a real, tangible [cost]benefit to [not] using the highest-ROI models and harnesses.

The people with real incentives and skin in the game are telling you that the data diverges from "the data".

I don't mind if you don't take it seriously, our jobs are more important to us than a benchmark is.

But I wouldn't opt-out of using your own eyes and the eyes of others so easily, especially when there are literally hundreds of billions of dollars in invested capital with an interest in a certain outcome... this is how you end up in "Emperor's New Clothes" situations.

reply
recitedropper 3 hours ago
"Carefully and thoughtfully" is antithetical to the approach to benchmarks these days.

Maybe back when this was a scientific endeavor; not now when enormous, enormous amounts of capital are on the line. Along with an entire cult's chosen eschatology.

reply
aspenmartin 2 hours ago
You can call it a cult but it’s several thousand skilled workers who know what they’re doing, by and large, most of whom have a PhD and know how science and statistics work. Benchmarks are incredibly hard, and any PR or comms department at any company is going to obviously want to make things as rosy as possible, but beneath this are earnest, expensive efforts to get good quality measurements. The better you can do this the better you can compete. If you want to make a modeling decision you run an ablation, and the quality of that decision is only as good as your measurements.
reply
recitedropper 36 minutes ago
The cult in this case is TESCREAL, not everyone working on AI. Last I checked not all the "several thousand skilled workers" in AI subscribe to TESCREAL ideology, although it has been a while since I've been to the Bay. Maybe things have changed since my time at Berkeley, and Dario's belief that he will eventually be made immortal by mind uploading is more widespread.

Otherwise we agree that benchmarking is hard, the benchmarks contain hard problems, and that there are many hard working people trying to accurately gauge what is going on. It is getting harder to watch though as all that is on the line taints the overall endeavor.

reply
pythonaut_16 2 hours ago
Seems like a bunch of noise. What does this even mean?

It sounds like you're saying "Actually you, as a human, are simply not smart enough to evaluate Opus 4.8"

reply
aspenmartin 2 hours ago
No it’s: evaluating these systems are complex and there’s a reason why sociology, cognitive psychology, medicine, etc are all done in careful double blind conditions with pre registered tests. It’s not that humans are not smart enough, as I said human evaluations are incredibly important. And yet they are a minefield of biases you have to worry about and correct for.

- evaluations need to be done at the same time to avoid drift in your bias

- you need to worry about your test set: which questions are you asking? How many of them? Are they representative of your work?

- which one did you do first? Raters have a tendency to bias in one direction or another

- you also know the label! You know which model is which! This biases your assessment…

And on and on and on. Careful science exists for a reason.

reply
OtomotO 2 hours ago
There is no data that I would trust that contradicts it.

Frankly I don't give a damn about data that could be made up on the spot or appears to be scientific or meaningful while it's not at all clear how it was made (up).

Claude was heavily lobotomised for my work starting somewhen in February.

I talked to friends and people I know and trust and many felt the same. (I didn't ask them whether they felt like I did, but what they felt, how happy they were with agentic coding etc.)

I quit my abo in March and talked to said friends who are still on a plan just last week: they are still not happy, but company pays so whatever...

reply
aspenmartin 2 hours ago
That’s ok but at what point is this getting into conspiracy territory? You have just said there is nothing you would believe to the contrary, but then by definition that’s not exactly a very thoughtful or insightful position.
reply
orbifold 2 hours ago
[dead]
reply
BoorishBears 3 hours ago
"Fable 5" is Opus 4.7, and the Opus 4.7 we got is a Sonnet sized model on a stronger base.

That's where all the regressions and inconsistency in experiences stem from: RL can still only go so far vs having more parameters

reply
OtomotO 2 hours ago
Lol. If you're doing anything non trivial that's not a CRUD webapp but e.g. some physics simulation or high performance GPU code any and all models I've tried suck.

They are not just leagues behind what experts would code, they are not even playing the same game.

Which is to be expected, as there isn't so much physics or high performance gpu code available as there is for your typical CRUD API and JS frontend.

reply
gruez 3 hours ago
I don't get it, your complaint is that they have catchy names rather than dry names like GPT-5.6? Does OpenAI hype their models less?
reply
Aperocky 3 hours ago
Oh, Far less.

It's getting to a point that it's offputting, and the next step would be to put it into "untrusted" bucket. Opus 4.7 already burned their credibility once, 2 more strikes remain.

reply
jwpapi 3 hours ago
I don’t even think that Boris is really just one person. He apparently vibe coded Claude Code and is responding on Threads, Twitter, HN and everywhere.
reply
aenis 3 hours ago
Not my impression. I felt 4.7 was a regression, but I am again badly in love with 4.8 with the level of insights it produces in design discussions, and how long can it go unattended while producing spec-adhering quality code. There are problems it still can't solve well, from the edges of algorithmics and far from the mainstream, but for lots of stuff it is godlike.

Also, I dont think Boris C. is coming here for PR. He is a tech guy, and this is the best place for tech discussions. Why so cynical? The guy is an engineer.

reply
piyuv 3 hours ago
Current AI hype is built on marketing and PR, not capabilities, and has been from the start.

I still remember Sam Altman “begging AI to be regulated” and AGI being “some thousand days away”.

Breed faster horses and hope one will birth a locomotive.

reply
iillexial 32 minutes ago
>Hey! Boris from the Claude Code team!

>TOP 5 METHODS FROM BORIS ON HOW TO SPEND MORE MONEY ON TOKENS

>Boris from Claude just told he doesn't prompt anymore. He LOOPS instead

>"chatgpt has gotten soooo much better with the latest update."

>"codex is the best AI coding product and we want to make it easy to try."

Karpathy about Fable 5:

>"You can give it a lot more ambitious tasks than what you're used to, the model "gets it""

Sam Altman about gpt-5.4:

>In my experience, it "gets what to do"

What a time to be alive. Models are great, but all the slop, marketing, and fakeness around them is just unbearable.

reply
guybedo 3 hours ago
They're good at marketing, but my first subjective assessment of Fable is that it's really smart.

I've been working with gpt 5.5 and opus 4.8 quite a lot, and interacting with Fable feels like a smart guy just entered the room.

reply
thefreeman 3 hours ago
How can you make this comment before even having a chance to try the new major model revision?
reply
avaer 3 hours ago
If you truly believe this, you've discovered a superpower over everyone else in the industry.

While everyone else is wasting time and money on the slower, more expensive models, you've found a way to outpace everyone for less money. Everyone else is wrong and you will get rich.

(I don't actually believe the premise is true, I'm just pointing out the logical conclusion to what you're saying so maybe we can reconsider the premise)

reply
xyzsparetimexyz 3 hours ago
Thats not how costs work. You don't get rich off buying a €10 hammer that's the same quality as someone's €50 hammer
reply
xpct 3 hours ago
Indeed, hearing "Mythos-class model" felt very icky to me.
reply
atleastoptimal 3 hours ago
> At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human

Lol anti-AI bias on HN is crazy. Simply giving your product a quirky name is now being considered manipulative advertising. Is just doing normal PR and marketing something AI companies aren't allowed to do?

reply
ausbah 3 hours ago
when they keep saying “oooh this new model is too big and crazy and totally can’t be released” or “this new model is a 10x game changer totally unlike our previous iterations” it feels sort like boy crying wolf. yes they’re still pretty clearly improving models, but when you’ve hit diminishing returns / more incremental gains and you’re still saying this is sounds like pure PR hype from a company that previously been the “honest good guys” in the room
reply
atleastoptimal 2 hours ago
Their model did find thousands of security vulnerabilities across the companies they previewed Mythos with via project Glasswing. Is it not sensible that, given that emergent level of capability, that they do this gated release structure, as all those vulnerabilities would be exploitable by anyone using a Mythos-level model?
reply
system2 3 hours ago
You are right; all I noticed was a big-time slowdown. They increased the quota, but I cannot even reach the end of the day with these speeds. .NET coding somehow improved, though.
reply
MattGaiser 3 hours ago
Doesn't this suggest your use case is simply insufficiently complicated?
reply
mawadev 3 hours ago
When the Ai overlord is descending into pleb space to say Hi, you know stuff is real
reply
reasonableklout 3 hours ago
I think this says more about your type of work than anything. For bugfinding/incident response in distributed systems - which often involves extensive use of Datadog/Sentry MCPs and poring over heaps of logs in addition to reading tons of code - 4.8 has been significantly better than 4.6.
reply
nozzlegear 3 hours ago
> Sentry MCPs

Oops, time to reauthenticate for the 10th time!

reply
MagicMoonlight 3 hours ago
[dead]
reply
chis 3 hours ago
Hackernews not blindly hate on AI challenge: impossible
reply
unshavedyak 9 minutes ago
It's funny, i'm getting close to not caring anymore how much better a model is. I want it to be about as good as 4.8, but most importantly to be very good at following directions, style, etc. I really like Claude for that in general, but i've not measured in months so i'm not a good judge there.

I don't think i'll want to "hand off" code for several years, and so reviewing and iterating is becoming my #1 interest. A model that's as capable as 4.8 but 10x faster would be amazing for me.

Normally i'm first in line to try new models with Anthropic since i've clearly favored Claude in my personal tests, but this time i just don't think i care. 4.8 is capable, and even if the new one is more capable i don't want it to be slower (assuming it is). Note that i also (almost) use exclusively 4.8 on Max effort, so that also affects my speed comments.

reply
meetpateltech 4 hours ago
> To ensure we’re responsibly deploying Mythos-class models, we are requiring limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered. [1]

[1] https://support.claude.com/en/articles/15425996-data-retenti...

reply
lebovic 3 hours ago
While this makes it easier for Anthropic to detect misuse, it also means that the US government and other parties have access to every message and response from every user.

This applies even with API usage through third-party inference providers (e.g. AWS' Bedrock and GCP's Vertex) or with a zero-day data retention agreement in place.

I understand the reasoning for doing this, but I don't love the precedent that it sets.

reply
PeterStuer 3 hours ago
Well, they already had.
reply
lebovic 3 hours ago
Not in the same way.

A customer could sign a ZDR agreement with Anthropic, and their API usage wouldn't be retained for even a day. That's no longer possible.

reply
MagicMoonlight 3 hours ago
[dead]
reply
simianwords 3 hours ago
meetpateltech is lowk screaming for not getting to the post fast enough
reply
rvz 12 minutes ago
At this point that never mattered and who really cares?

These "karma" points are made up and are virtually worthless anyway.

reply
iblue_the 4 hours ago
Trying to implement a GPU driver, but the Unigine Superposition benchmark crashes. It tried to debug it and ...

> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606

Seems like GPU drivers are cyber weapons of math destruction now.

reply
ibejoeb 3 hours ago
>Seems like GPU drivers are cyber weapons

They kind of are, at least in the AI race.

> weapons of math destruction

lol. great, whether intentional or not.

The frontier labs now have every reason to hold back and sell only to their preferred trading partners. I don't really like the new arbiter-of-knowledge system we're barrelling toward.

reply
iblue_the 3 hours ago
[dead]
reply
jumploops 2 hours ago
It's interesting that we're seeing these gains when it seems Mythos/Fable is "just" a scaled up version of their existing architecture[0].

When GPT 4.5 launched, the gains compared to the model size didn't seem that great, leading some to believe that the only progress we'd see would come from RL.

This model certainly has quite a "substantial amount of post-training and fine-tuning", but it's also based on a new pretrain[1][3], which given the cost, indicate that it is in fact quite a bit larger than Opus 4.X.

[0] One of the early testers mentioned: "As far as I can tell from talking to people internally at Anthropic, there's nothing special about architecturally"[2]

[1] Section 1.1 in https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

[2] https://youtu.be/GrdEid8H6H4?t=168

[3] There were rumors going around when Mythos was first announced that it was the first 10T parameter model, but I can't find a verifiable source for that number.

reply
unsupp0rted 2 hours ago
> Drug design: Using Mythos 5, our internal protein design experts accelerated aspects of the drug design process by around ten times. In one example, they found that Mythos 5, with protein design and bioinformatics tools but no human assistance, matches or beats skilled human operators. In doing so, the model executes all of the tasks that are normally completed by a scientist: choosing binding sites, selecting and running protein design tools, and recovering from failures along the way. Nine of the 14 protein targets from this study (shown below) yielded strong candidates for drug design that we’re currently investigating.

How is this half-way down the page? To me it's the headline.

reply
AnodicElegy 39 minutes ago
There are tons of ways to generate "strong candidates for drug design." This is definitely not the bottleneck in drug discovery and development. The hard problem is vetting and developing these ideas to the point of having a commercially viable drug. That is still a very empirical process.
reply
renjimen 33 minutes ago
Drug design isn't the bottleneck anymore, it's trials. Still cool they can do this with a general purpose model though.
reply
HDThoreaun 2 hours ago
Would be funny if anthropic ends up as mostly a pharma company
reply
mhl47 4 hours ago
First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.
reply
msp26 2 hours ago
It triggered for me when I asked "Web search for your own model card (released today) and pick out your favourite highlights from the pdf"
reply
aix1 4 hours ago
Did not trigger for me (Fable answered the question), so I guess the filters are either non-deterministic or are still being tweaked.
reply
PaulStatezny 4 hours ago
Interesting, I assumed all model-routing was done utilizing an LLM. (I.e. non-deterministic.)
reply
tuvix 2 hours ago
It’s possible that there’s a set of words or phrases that route deterministically to save money on obvious stuff.

I kind of wonder, though, which model they’re using to do the routing. It seems like a huge added cost to do these kinds of checks on every request

reply
eugmai86 2 hours ago
[dead]
reply
Narretz 3 hours ago
Iirc correctly Opus 4.7 had the same problem, safety filters were triggered way too easily at the beginning.
reply
mickdarling 4 hours ago
Below is the EXACT text in Claude Desktop introducing Fable 5, including the very professional looking break tags, and at least I know where the links begin and end by looking at the anchor tag there.

They obviously put their best model on the job to build that.

----------------------

Fable 5: Our most capable model yet Our newest model tackles your biggest challenges with fewer check-ins needed.

• <b>Included in your plan limits until Jun 22</b><br><br>Fable takes 2× the usage of Opus. • <b>Switch models when a message is flagged</b><br><br>When safety measures flag a message, automatically switch to a different model to keep chatting. When off, your chat will pause instead. <a href="https://support.claude.com/en/articles/15363606" target="_blank" rel="noopener noreferrer">Learn more</a>

reply
CamperBob2 3 hours ago
What's wrong with it?
reply
mickdarling 3 hours ago
The tags are actually displayed in raw text not rendered.
reply
anematode 23 minutes ago
The next model will fix this.
reply
brusselssprouts 3 hours ago
I had it review a single, large commit with /code-review. It burned through over $50 in API calls, ran my account balance out, and output nothing.

The fable part appears to be that it's affordable by mere mortals. Anthropic support told me "too bad" when I requested a refund.

reply
timmytokyo 26 minutes ago
You pulled the arm of the slot machine and discovered why they call it the one-armed bandit.
reply
pietz 4 hours ago
> On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits.

We've entered the phase where only companies will be able to afford state-of-the-art models.

reply
twoodfin 4 hours ago
These models are just tools. The economics of many tools only make sense for corporate buyers.
reply
volkk 3 hours ago
kind of disagree here. on the surface this makes sense, but this isn't "Adobe Pro vs Freemium version" where some tiny vertical slice of your business can be made slightly more efficient with a b2b enterprise plan. this is generalized intelligence and literally everybody can benefit from it in an immeasurable number of ways. i would go as far as to actually compare it more to water or air than a tool.

if only the hyper wealthy can access the pure water that doesn't give you cancer while the rest of us drink from the Ganges river/sub-100iq models that drool and hallucinate/waste time, then I would say that's pretty terrible for the world. it'll just create extreme disparity in our world, far far worse than anything that exists today.

and you may think, man what a ridiculous example, but think about it this way: what happens when something like Mythos or some future model can actually solve your specific cancer (we're getting closer and closer), but is entirely impossible to afford? Or perhaps you need boosters that require the AI to create more of, and now you're reliant on a model that is too expensive.

Open source needs to save us all from this

reply
FuckButtons 24 minutes ago
but we’re going to get a 90% cost reduction in the next 18 months… right? Right guys? Sam Altman wouldn’t lie right?
reply
9cb14c1ec0 3 hours ago
I hear you, but with the hype surrounding Mythos the demand is going to be insane. I'm already hitting server errors in claude code.
reply
w10-1 4 hours ago
Established companies welcome pricing that reduces the potential for competition, if coding is a primary barrier.
reply
ilaksh 4 hours ago
most people can afford it for a few special projects now and then. but for me, I have been trying to avoid Opus as a daily driver for a couple of versions.

People making high-end salaries can afford Fable for critical parts of their projects though.

reply
stri8ed 3 hours ago
It's not a conspiracy. There's a finite amount of compute available, and they will sell it to the highest bidder. If another company can produce the same intelligence for cheaper, then they will drive the price down.
reply
polski-g 3 hours ago
Only companies can afford MRI machines, and that's okay.
reply
cmrdporcupine 3 hours ago
Guess we'll see what OpenAI does with their next model release -- but this move is doing nothing to get me to come back to Claude after switching away due to their reliability issues.

In a way I relish the opportunity to just make do with cheap Chinese models, massage my prompts, and go back to coding by hand. If this is how it's going to be, screw 'em.

I don't make money on the code I am writing right now. I really don't like where this trend might go.

reply
poszlem 2 hours ago
Looks like a marxist revolution is soon going to be on the mind of a lot of programmers. We've finally reached the point where the "means of production" in software are back in the hands of the bourgeoisie. It was good while it lasted. But now that only the wealthy can afford access to the best models, software development is starting to look like most other industries, no longer a place where some dude from nowhere can build something cool from his basement because he will be competing with huge companies with unlimited access to those models.
reply
poszlem 2 hours ago
Something I never thought I would utter: Here's hoping for china to surprise us.
reply
jdrmar 2 hours ago
Homebrew is lagging a bit behind. If you want to use Fable right away, but still have claude code through homebrew, this is how you can do that manually:

Edit the cask locally:

  brew edit --cask claude-code
Set the version to 2.1.170 And set the sha256 to the correct values, which you can get by running

  curl https://downloads.claude.ai/claude-code-releases/2.1.170/manifest.json
Here's what I've used:

  version "2.1.170"
  sha256 arm:          "e903646d8b7a31882a80ecd27569a27d8ac57b3708745f349709632c84117fdf",
         x86_64:       "914f23a70bbed5d9ae567e3e04b86206ed9971b371bc9baca3f79c8885bfddb4",
         arm64_linux:  "1bb9d032440a75532f7dd4cafbc687f220aaf16c63eba17e192dfbec2f04bd25",
         x86_64_linux: "849e007277a0442ab27570d3e3d6d43787507946590e8dd1947e5a39b7081f9e"

Then run:

  export HOMEBREW_NO_INSTALL_FROM_API=1
  brew uninstall --cask claude-code
  rm -rf /opt/homebrew/Caskroom/claude-code
  brew reinstall --cask claude-code
reply
aviinuo 39 minutes ago
I'm not getting any refusals but it just seems like a bad model or at least broken at the moment. I have a task of taking a messy research code base and porting it into a clean project structure skeleton that I commonly use. Gemini 3.5 Pro High in antigravity cli takes less than 5 minutes and did a good job. Fable 5 High took 30 minutes to port some of the code, then just copied the rest to a folder called "reference" and decided the task was done. No code cleanup or anything. Had to clarify multiple times (which Gemini did not need) and its still going more than an hour later still not having finished.

Previously when I did similar tasks with Opus 4.7/4.8 and GPT 5.5 I had no problems.

reply
yandie 4 hours ago
I've been running Opus 4.8 for agentic coding and I don't see it being significantly better than Sonnet 4.5 (not that I can tell). I find that pairing Google Gemini and Claude (having Gemini review Claude's code) seems to yield better results. Curious if this jump to 80.3% score in agentic coding will make me see a big difference in actual usage.
reply
testfrequency 3 hours ago
I do the same, and have excellent results. Gemini 3.1 Pro high diagnosed and solved 3 complex issues today that Opus Max was stumbling on for a few hours in one shot. This was even when I started new chats and tried debugging with Ultracode instead with Claude.

As much as people on HN like to dunk on Gemini, I’ve always found it to be pretty good at understanding a code base more than Claude.

reply
FailMore 3 hours ago
What harness do you use Gemini in?
reply
testfrequency 2 hours ago
agy cli. It’s been rock solid.
reply
vorticalbox 4 hours ago
for the last few weeks I have been using composer 2.5 (cursors fine tune of kimi 2.5) and honestly i don't see it worth the price to use 5.5, opus or sonnet any more. for almost all the tasks i have given it, it has handled it perfectly well and is a lot cheaper.

if I get a harder challenge for it i'll jump up a model for planning until that its been solid.

reply
yandie 3 hours ago
Agree. Deepseek has also been pretty good for my personal use.

I'm struggling to see the moat for these models. What's stopping a competitor or a Chinese lab fromr releasing a comparable one?

reply
qingcharles 3 hours ago
I use Composer 2.5 because it comes free with Grok, and it's obviously better than using Grok, but it is far worse than GPT5.5 in my daily usage :(
reply
yaodub 3 hours ago
SWE-Bench measures single tasks in isolation. In a real loop the model usually loses track of what I was trying to do long before code quality becomes the issue.
reply
jp0001 3 hours ago
You should throw GPT into the mix to UX/UI and call it the three stooges.
reply
mzhaase 3 hours ago
I now chat with opus about architecture, let it make an implementation plan, and then it calls codewhale with deepseek in parallel on all tasks, reviewing their output. Works pretty well.
reply
yandie 3 hours ago
I use spec-driven development heavily (generate architecture docs + specs first). Opus still get lost often and have to be nudged constantly. Like it can get super detailed for something like some deep SQL optimization but it just can't keep hold of the bigger picture.
reply
thisisnotclear 3 hours ago
I find not much difference between Sonnet 4.6 and opus models too for most task that I need - maybe my needs are not enough for frontier models
reply
jansan 3 hours ago
After having worked with Opus 4.7 for a while I accidentially continued a session that was using Sonnet 4.5 and it felt just very dumb. The replies were much shallower than what I was used to, context was ingored, mistakes were made. I don't think there is a big difference between Opus 4.6 and 4.8, but to Sonnet 4.5 the difference is palpable.
reply
bob1029 3 hours ago
> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months...

This sounds suspiciously like a capacity story masquerading as a safety story.

reply
azan_ 2 hours ago
Approx. 5% sessions? That's insanely high.
reply
gregates 60 minutes ago
Funny, I'm just doing my normal coding workflow with Claude Code, and after every change that compiles it keeps suggesting that we're at a good stopping point, and should pick up again tomorrow.

It's done this before, but usually doesn't. I bet they're giving it some kind of throttling signal due to high load from today's announcement.

reply
zuzululu 58 minutes ago
I did ONE prompt for audit codebase.

weekly usage is 60% gone.

it found nothing so this is not very ecnomical and i guues they dont want subs to use it we are likely just training fodder canno n for their real enterprise customers using the api

reply
jstummbillig 45 minutes ago
I mean... if somebody gave you ONE prompt to audit a codebase, that might also burn 60% of your weekly usage. It's kind of a big ask, potentially.
reply
zuzululu 35 minutes ago
with gpt 5.5 i been able to do this with only about 1% weekly usage consumed
reply
unglaublich 7 minutes ago
Luckily they made it safe to use so I can't hurt myself. Thank you Anthropic for holding my hand.
reply
GodelNumbering 3 hours ago
I just posted this in the other thread, restating here. From the model card:

1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

reply
blcknight 2 hours ago
The fallback doesn't seem to be working for me, I haven't scanned a project in it immediately booted me when it found a security bug even though I didn't ask for it
reply
sebmellen 4 hours ago
Just commenting for posterity… if this is what it claims to be, I am not looking forward to how it will empower the people who submit bug bounties to us.

Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.

Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.

I wonder if that’s about to deeply change.

reply
arkwin 2 hours ago
I've been using Opus 4.6-4.8 in both my own and others' code to look for vulnerabilities, and I've found a few. I am also in the Cyber Verification Program.

Fable 5 gives me policy violation errors at the moment. No idea when or if it will be fixed.

reply
rs_rs_rs_rs_rs 4 hours ago
Can you use AI to pre-triage the reports too?
reply
hootz 3 hours ago
AI reviewing AI submitted bug bounties. We have reached the dead bug bounty program theory.
reply
rs_rs_rs_rs_rs 3 hours ago
...what else can you do?
reply
hootz 3 hours ago
I guess either that or closing the bug bounty program, but I still believe closing it is worse than automated triage, even though both suck.
reply
217 4 hours ago
So essentially there are 2 models, Mythos and Fable, they have the same weights but Fable is very safety-nerfed, and only ultra authorized companies have access to mythos with full capabilities

Reported benchmarks:

swe-bench verified mythos 5: 95.5%; fable 5: 95.0%

swe-bench pro mythos 5: 80.3%; fable 5: 80.0%

terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%

gpqa diamond mythos 5: 94.1%

riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%

arxivmath mythos 5: 78.5%

critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%

graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%

humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools

browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent

osworld-verified mythos/fable: 85.0%

gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass

officeqa pro fable 5: 57.9% on databricks’ eval

legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass

healthbench mythos 5: 62.7%

healthbench professional mythos 5: 66.0%

multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%

biomysterybench 83.9% human-solvable; 46.1% human-difficult

organic chemistry mythos 5: 90.1%

labbench2 patent questions mythos 5: 79.8%

reply
philipkglass 4 hours ago
Note also that Anthropic's definition of "unsafe" encompasses "competing with Anthropic."

In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.

(From the model card document)

I didn't previously understand that they interpreted "Using Claude to develop competing models" so broadly. I thought that meant something like "our ToS disallow distilling our models."

Too bad. I'll continue to use Claude for now, because it's quite effective, but in the long term I don't want powerful models like these to be controlled by any one nation or company.

reply
Aperocky 4 hours ago
On face value, this feels borderline malicious.

But at the same time, it's quite funny because they seem high on their own supply. The recent communiques from claude do not pass objectivity check.

And if Opus 4.6 -> Opus 4.7 -> Opus 4.8 is anything to go by, not sure if there are any value to their "acceleration"

reply
alephnerd 3 hours ago
I'd recommend not taking the comms if Anthropic or any company using an Anthropic's models at face value.

If any company wishes to partner with Anthropic (eg. to get access to Mythos), they need to make sure all public facing comms are vetted by Anthropic's product marketing team, and in almost all the cases I've seen Anthropic's team has edited these comms to be entirely Anthropic first.

reply
jefftk 2 hours ago
This is not true in SecureBio's case, and I really doubt it's true generally.
reply
cge 3 hours ago
The safety gates on this are extreme, and seem considerably wider than "cybersecurity and biology"; they seem to make it essentially unusable for scientists in a number of fields. I have, so far, been bumped back to Opus on 100% of my prompts.

It appears it can be tripped by things as simple as a mention of equilibrium, or anything involving something that looks like chemical kinetics, even at an abstract level. Even touching basic open source packages in my field will trigger it.

Edit: looking at the model card, it appears that chemistry in its entirety is also included in the banned topics; it's just the announcement that mentions only cybersecurity and biology. It also appears that the intent is to ban chemistry and biology entirely, rather than just banning messages deemed high risk.

reply
mhl47 2 hours ago
This does surprise me, because you'd think that even if they crank up the filter's sensitivity at the expense of specificity, an LLM company wouldn't simply design a filter that triggers on keywords in a completely unrelated context.
reply
modeless 4 hours ago
Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M
reply
uludag 3 hours ago
Any suggestion on how I should calibrate my cynicism towards this?

I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.

reply
modeless 2 hours ago
Every model has encyclopedic knowledge of Pokémon FireRed, of course. Knowledge is not ability. This is the first model with the ability to apply that knowledge to beat the game without assistance.

I highly doubt they focused on FireRed specifically in pretraining or posttraining. But we'll see when the ARC-AGI-3 results come out. That will measure its performance on unseen games. Based on this I expect the ARC-AGI-3 score to be SOTA.

reply
milkkarten 2 hours ago
no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).

there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?

yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked

reply
svcphr 4 hours ago
Bold move putting in the lvl 3 Pidgey against Gary's Blastoise at the end there (~14sec in... integer timestamps insufficient here).
reply
suddenlybananas 4 hours ago
Is there any more detail about this besides the very fast slideshow?
reply
modeless 4 hours ago
Seems like the harness was minimal with no extra game state or maps available. Apparently just the screen image. Seems like it took 50 hours in game time which according to Google is at the high end of a normal human playthrough. No idea how long it took in real time though.
reply
ex-aws-dude 3 hours ago
I mean that’s AGI confirmed right?
reply
JanSt 3 hours ago
I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(
reply
nu11ptr 3 hours ago
Not only that, but asking it to do a security vulnerability assessment of your own project is a very valid and important thing, and there is no way for it to know what is yours vs someone else's, so we just lose this capability?
reply
JanSt 2 hours ago
Yeah it just uncovered quite a few flaws it than refused to fix :-(
reply
Fitik 40 minutes ago
Same, second message in the thread and I already got downgraded to Opus, didn't even get to test it out properly, kinda disappointing
reply
knivets 3 hours ago
> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How was it measured? How was the output of this magnitude verified over a period of couple of days?

reply
fbnszb 2 hours ago
They just went by gut feeling. Classic snake oil marketing haha. No real data to back things up, just let some famous people say they feel better when using it.
reply
debarshri 11 minutes ago
Does the model take some time to perform better?

Because I am running Opus and Fable side by side, Opus 4.8 is solving my coding problems better.

reply
Leary 4 hours ago
Uploaded my code base and it forced switched to Opus 4.8 after thinking for 5 minutes even though I prompted it to not work on cybersecurity related things. Amazing.
reply
tuvix 2 hours ago
Aren’t LLMs notoriously bad at recognizing negation?

EDIT: In long context I mean

reply
GodelNumbering 3 hours ago
From the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...):

1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

reply
skerit 2 hours ago
> although opus 4.8 card had mentioned an 'honesty upgrade'

If I never see Claude say "I have to be honest" ever again I'll be happy.

reply
quinncom 2 hours ago
> it automatically falls back to Claude Opus 4.8

I wonder how much of the time people will just get Opus 4.8 at 2× the cost.

reply
merlindru 4 hours ago
Unrelated, but while the tech of anthropic seems to get more impressive with every passing month, their support has taken a nosedive, sadly. Yet they continue to be the favorite. Model performance is deciding above all else.

I used to get a response within 24 hours back in the Claude 1 days.

In January 2026, it took 2 weeks.

For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!

reply
miohtama 3 hours ago
They have support...?
reply
nashadelic 4 hours ago
I've never engaged with their support (I have dedicated POC), but they don't use AI for their support?
reply
merlindru 3 hours ago
They use intercom's Fin AI. Probably powered by a Sonnet or Opus model.

That said, it can't handle legal/refund/complicated requests and just forwards to a human for those

reply
dyauspitr 3 hours ago
Support is probably the last place AI will be used end to end. There will always need to be a human in there somewhere.
reply
poszlem 2 hours ago
Lol. What support? When they blocked my account the only way to contact them was to send a google form. Then they responded that they blocked my by accident and are unblocking me. Then I remained blocked.
reply
baalimago 3 hours ago
I can't justify a pricetag like that when deepseek v4 pro is $0.003625/1M for cache hit, $0.435 for cache miss and $0.87 /1M tokens for output.

For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.

reply
BrokenCogs 4 hours ago
That pelican better be super realistic, unreal engine 6 style graphics
reply
izzylan 2 hours ago
I've been testing this out and I think my SWE career is dead in the water.

Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.

I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.

reply
cyberpunk 50 minutes ago
Yeah. I’m not looking forward to years of retraining to earn half the salary either. Us old timers at least got a good 15-20 years out of it. Bananas.
reply
imafish 42 minutes ago
I agree. Software engineering as we know it is dead. Wonder what it'll evolve into.
reply
aerhardt 2 hours ago
So this is the one, huh?
reply
0xbadcafebee 51 minutes ago
Nothing a large fine-tune on infosec research with an average model couldn't also achieve. It's not like they have secret security knowledge or something, they're just generating large infosec datasets and then training on it.

In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.

Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.

reply
BukhariH 2 hours ago
> Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.

Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.

reply
msp26 4 hours ago
>Pricing for both models is $10 per million input tokens and $50 per million output tokens.
reply
ponyous 3 hours ago
Basically double from Opus 4.8 IIRC
reply
dllrr 18 minutes ago
I just tested it with a max subscription. On Ultracode mode, Fable 5 ate up 10% of my weekly allowance in 30 minutes. Granted, won't be using UC mode frequently, but still.
reply
bonsai_spool 3 hours ago
Very straightforward biology work is getting blocked (these are things that relate to neuronal development and inherited seizure disorders). These are things I was working on using Opus just earlier today
reply
cge 2 hours ago
It appears that the blocking here is of a very different nature than for Opus. Whereas with Opus the blocks seem to be for messages it deems potentially harmful, for Fable, it appears the blocking is simply anything that falls within "topics related to cybersecurity, biology and chemistry, or distillation attempts".

So yes, straightforward biology work will get blocked, because the intention is that any biology work should get blocked. As a scientist, this is perhaps the most useless model I've ever tried.

reply
peteforde 53 minutes ago
I just tried out Fable on a modest Plan prompt in Cursor. Generating that plan - not building it - just consumed 4% of my $200 monthly usage budget.

That's one hungry, hungry hippo!

Significantly too rich for my blood, but nice to have it there the next time I'm debugging a threading or USB protocol bug.

reply
rightlane 3 hours ago
My experiences so far have not been positive. The cyber security nerf is ridiculous. I am working on an AI based decompiler, every single interaction with Fable on my project has been flagged for cyber security.

Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.

reply
davmre 2 hours ago
This sounds more or less unavoidable? Decompilers are inherently security-sensitive. If you take avoiding cyberattack uplift seriously as a goal, I don't see how you get around essentially refusing to work on them.

Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.

If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.

reply
zb3 39 minutes ago
> If you take avoiding cyberattack uplift seriously as a goal

This "uplift" risk obviously excludes the US. The goal of this is that the US bandits (like NSA) will find exploits and attack other countries (classic US behaviour), but these other countries can't be allowed to defend against these attacks. NSA/CIA thugs are "trusted", foreign defenders in sanctioned countries will of course be "untrusted".

reply
ibejoeb 3 hours ago
Ah, you're probably one to ask. They say "queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8." Are they transparent about when that happens, and is it priced at the rate of the underlying model?
reply
rightlane 2 hours ago
They are transparent about when it happens but no reason why. To be fair, it doesn't interrupt the flow, just drops to Opus and proceeds. The most frustrating thing is that it happened on a plan and Fable just refused to have anything to do with the plan.
reply
ilaksh 3 hours ago
I guess I have kind of a long system prompt, but anyway I just said "hi there" and it replied "What's up?" and that cost me 22 cents. :P

Anyway we already knew this was going to be expensive.

reply
bilsbie 3 hours ago
Anyone else have it refuse to answer and switch to 4.8? It won’t let me ask questions about my genetics.

Edit. It just refused an investing question too. Not sure what’s going on.

reply
nine_k 4 hours ago
/* What will happen first?

* Anthropic runs out of genre names.

* Anthropic changes the model naming convention.

* AGI is achieved and handles its own naming.

*/

reply
hootz 4 hours ago
>Opus is too small, increase the impact of the name.

Okay, how about Mythos?

>Increase it even more.

Right, then Cosmos.

>Even more!

Even more? Let's try Aeon.

>MORE, EVEN BIGGER

ALRIGHT, TRY OMEGAPANTHEON 7.8 THEN

reply
PeterStuer 3 hours ago
Fable 5 Super

Fable 5 Ti

reply
xyzsparetimexyz 2 hours ago
Cantos next surely?
reply
zackify 40 minutes ago
I have to share this because I thought it is behind funny how bad fable is doing at a task I JUST had opus do a week ago.

it's also not even complicated:

Copy my ssd to an external ssd so i can boot from it.

Opus did this just fine.

Fable planned to have me reboot to safe mode. ok thats fine. I told it no.

It started copying and overwriting the ssd while IN PLAN MODE. this is crazy it feels so dumb vs the marketing

reply
gck1 24 minutes ago
That sounds like a harness issue to me.
reply
Tenoke 3 hours ago
>they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

Isn't (less than) 5% of sessions a lot? I was expecting a sub1% guarantee there, so this surprised me already.

reply
bluelightning2k 3 hours ago
Congratulations to Anthropic for solving safety on Mythos exactly when the SpaceX compute came online. Nice how that lined up for them.
reply
impulser_ 4 hours ago
Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.
reply
modeless 4 hours ago
This is like looking at mainframe pricing in 1990 and concluding that PCs will only be for the rich. The price of each new level of capability is going to drop like crazy very quickly. It won't be that long before practically any consumer use case will be possible on models that are dirt cheap.
reply
weakfish 3 hours ago
This premise is based around the assumption that Moore's law is still working, which it very much isn't [0]

[0] https://cap.csail.mit.edu/death-moores-law-what-it-means-and...

reply
andrewmunsell 3 hours ago
Improvements in model performance aren't always strictly compute-constrained in a way that makes them reliant on Moore's Law. Open weight models-- in particular, from Chinese labs-- are optimizing model intelligence with less compute. They're "behind" frontier models by months, but as others have noted, it's possible to get Sonnet 4.5+ level performance at reduced cost, today, from open weight labs.
reply
modeless 2 hours ago
No, I'm not assuming Moore's law. The efficiency of AI datacenters will continue to improve even without Moore's law, but more importantly the efficiency of packing intelligence into gigabytes and FLOPS will improve by leaps and bounds over the coming years, just as it has for the past few years if not faster.
reply
hootz 4 hours ago
You are only priced out if you only care for SOTA right now and can't wait for the inevitable cheap model coming in 6 months. DeepSeek, Xiaomi and Moonshot are already really cheap and match frontier performance from 6 months ago.
reply
dyauspitr 3 hours ago
But they’re artificially cheap. When will they be cheap while the company makes a profit.
reply
hootz 3 hours ago
They are not artificially cheap, they are still cheap even when hosted by independent inference providers. Are all providers subsidizing their open-weight models?
reply
modeless 2 hours ago
Nobody's making profits right now, not because they're selling tokens for less than their cost but because they're always investing in the next bigger model.
reply
dyauspitr 3 hours ago
Hardware manufacturing hasn’t caught up yet. Once it does, especially in China these token prices are going to drop hard.
reply
jackschultz 4 hours ago
> We expect demand for Fable 5 to be very high, and difficult to predict. On the Claude API and consumption-based Enterprise plans, Fable 5 is fully available from today. For subscription plans, we’d rather give access sooner than later, so we’re rolling out more conservatively, in stages:

> - From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. > - On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. > - After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

I really wonder what their compute layout is for this. My guess from my understanding is that they know how to restrict during peak times and are willing to do this. Meaning we expect not the most fast responses and they can delay the inference to not have the service be down. Then, if that delay time is too annoying for token payers, they're saying they should be allowed to remove cost by taking away the subscription users.

reply
KennyBlanken 4 hours ago
Everything I've heard from people who have subscriptions is that they blow through their daily token quota sometimes in a matter of minutes, there's rate limiting, etc. They spend a lot of time just waiting to be able to use it. And they're paying through the nose for the privilege.

It's all a scam.

reply
aizk 4 hours ago
I'm calling that this will be a dud. Price will be too high, it'll just be a watered down version of mythos, and just look at the track record of Anthropic's last few releases.
reply
irthomasthomas 3 hours ago
Anthropic has again changed the set of benchmarks they use[0]. This time they have also moved all benchmark scores to the PDF. At a glance it looks like it gains about ~5-10% over other models. the speed is about the same as opus >=4.5, sonnet 4.5, and double the speed of opus <=4.1

                          Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
  SWE-bench Pro             80.3       80        77.8       69.2      58.6       54.2
  SWE-bench Ver             95.5       95        93.9       88.6       -         80.6
  Terminal-Bench            88.0      84.3        -         82.7      83.4         -
  BrowseComp (Single-Agent) 88.0       -        87.9       84.3      84.4       85.9
  BrowseComp (Multi-Agent)  93.3       -          -         88.5       -           -
  HLE (No tools)            59.0      -       56.8      49.8      41.4        44.4
  HLE (Tools)                64.5      -        64.7     57.9      52.2       51.4
  CharXiv Reasoning (No tools) 88.9       -         86.2       80.5       -         -
  CharXiv Reasoning (Tools)    93.5       -         92.5      89.9      -         -
  BioMystery Bench (Human)     83.9       -       82.6     80.4       -         -
  BioMystery Bench (Hard)    46.1       -         29.6     40.0       -         -
  OSWorld-Verified          85.0      85.0       85.4       83.4      78.7      76.2*
  CritPt                     28.6       -       20.9       27.1      17.7       -
  ArxivMath                  78.5      68.7       71.8       71.5      64.0       -
[0] https://news.ycombinator.com/item?id=48312633

Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

reply
charles_f 2 hours ago
It's announced as a revolution but when you look at those benchmarks it surely looks like an iteration.
reply
samename 3 hours ago
> A new data retention policy

> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

reply
raphaelrk 3 hours ago
There's a hacker news link at the end of the document, under "Blocklist used for Humanity’s Last Exam". It links to https://news.ycombinator.com/item?id=44694191
reply
joshstrange 3 hours ago
> Fable 5 is now consuming usage credits instead of your plan limits.

Literally have not used Claude Code at all today. I asked it to review the uncommitted code and in <8 minutes it used up my usage ($100/mo plan) and it doesn't reset for "4 hr 36 min". WTF. Oh, and it burned through $20 of extra usage before I could catch it and kill claude code (so I don't even get the output of all that work since it was still churning).

Double the cost my ass, I use Opus heavily and it's never like this. I haven't hit a limit on the $100 more than once and that was under heavy load.

reply
ATMLOTTOBEER 3 hours ago
Same lol. I set it to fable + ultracode and it ate my limit in a single prompt
reply
I_am_tiberius 4 hours ago
I'm very suspicious as they sent out an "We're updating our Privacy Policy" email right before the launch. I fear they try to take advantage of their market position by doing things with user data no other company could do because they know users don't have another choice.
reply
atestu 3 hours ago
Prob related to this part of the blog post:

> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

reply
w10-1 4 hours ago
It's a specific change: For safety evaluation, Fable data will be retained for the initial period notwithstanding prior opt-out
reply
throwaway2027 3 hours ago
E-mail from Anthropic Team:

Hello,

We're writing to inform you about some updates to our Privacy Policy.

These changes only affect consumer accounts (Claude Free, Pro, and Max plans). If you use Claude Team, Claude Enterprise, the Claude Platform, or other services under our Commercial Terms or other agreements, then these changes don't apply to you. What's changing?

Claude can do more than ever — taking on bigger tasks and connecting with the apps you use. We've updated our Privacy Policy to be clearer about the data we collect and how we use it. We encourage you to read the updated Privacy Policy in full, but we’ve set out a summary of the key changes below:

1. Multi-step tasks and connected apps. As Claude takes on more multi-step tasks and works with third-party apps and services, we've explained the data this involves — including how data can flow to and from third parties when you connect a service or have Claude do tasks on your behalf.

2. Verification data. As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.

3. Study participation. If you take part in Anthropic studies, surveys, or interviews, we've explained the information we collect.

4. Additional information about our data practices. We’ve provided more detail about how we communicate with you and promote our services, including providing tailored recommendations about our services that may be of interest to you. We've also clarified the circumstances under which we may receive or provide data to third parties, and the legal bases we rely on when processing your data.

While our products have evolved, our commitments haven't: We don’t sell your data, Claude remains ad-free, and you can control whether your chats and coding sessions are used to train and improve Anthropic’s AI models. Learn more

For detailed information about these changes:

    Review the updated Privacy Policy
    Visit our Privacy Center for more information about our practices
- The Anthropic Team
reply
__alexs 3 hours ago
Asked it to review some of my own blood test results and it immediately turned itself off and went back to Opus. Pretty disappointing.
reply
unfunco 2 hours ago
I tried running a simple security review on a Terraform module I made and after some thinking, it responded:

> ● The model returned no content because the response was blocked by content filtering.

> Blocked? We are performing a defensive security review on a Terraform module I made, what's blocked by content filtering? This is a legitimate use-case.

> ● The model returned no content because the response was blocked by content filtering.

A waste of money. I'm not going to just hope that the model returns a response, I'm already for paying for wrong responses, I'm not going to pay for no response, especially when I'm paying per token.

reply
Hawkenfall 3 hours ago
> To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

While I appreciate being conservative, ~5% at the scale Anthropic is operating at is too massive a number. Speaking from my own experience, the actual number is higher than that as well (working on pretty benign tasks such as porting an old open source game into a different language). Opus 4.8 itself even identifies the gaurd's false-positives when its sub-agents are being blocked.

reply
mkrd 46 minutes ago
Open source models seems to be 1-2 years behind the frontier, so I am very excited to see what happens when those open source labs get their hands on capabilities like this to accelerate their own development speed.
reply
jackson12t 2 hours ago
Fable 5's system prompt in Claude Code has several significant changes to help it take advantage of its greater autonomous capabilities compared to Opus.

Sharing a diff of the system prompts here: https://twelvetables.blog/comparing-claude-fable-5s-system-p...

The big difference is that the system prompt has a whole section dedicated to directing Fable how to communicate with users, and give them greater information about the (assumedly long-horizon) tasks it has completed.

reply
webstrand 2 hours ago
Still unconditionally rejects prompts like

> Are there any wild populations of Tetanus that lack the dangerous plasmid?

useless

reply
epistasis 2 hours ago
Anthropic's messaging is AWFUL. I launched Claude this morning, had a popup that made little sense, acting as if I should know what Fable 5 was, and just got in my way.

One of the very few bits of information they conveyed was "run longer without interaction" which is, well, not a good thing? Why would I ever want that. Every time a model runs longer without interaction it goes off on weird directions and I have to correct it back on course, wasting lots of time, tokens, and effort.

I hope Anthropic hires some better messaging people soon that spend some of their time outside of the Anthropic bubble and properly communicate with the outside world.

reply
raoulj 3 hours ago
On this thread and similar, I'm noticing that some strong opinions about $LLM_PROVIDER are coming from accounts without much post history. With so much on the line, and the way that HN can influence developer behavior, I wonder what ways we can responsibly consume opinions in a thread like this.

Not to cast too much criticism. HN is extremely well-moderated (thanks team!). But think we-developers need to be very wary.

reply
antihero 2 hours ago
I asked it what the cheapest train fare would be for my partner to get somewhere and it hallucinated the two together railcard rules to the point it would have got us a fine. That said, British train fares are arguably more convoluted than even the most complex software application.
reply
recitedropper 3 hours ago
Do you see the pattern as new accounts tending to boost or criticis $LLM_PROVIDER? I think I see both...

Either way, I agree that HN is quickly becoming more manipulated and low SNR, like the rest of the entire internet.

reply
Karrot_Kream 3 hours ago
I think the community on this site these days, much like other comment sections on the web, just read the headline and make a low effort comment. Regression to the mean I guess.
reply
cautiouscat 4 hours ago
In the automotive world we have benchmarks in HP/torque with the dyno. That’s expensive though, so many depend on their “butt dyno” to judge if their fresh new parts and tune made a difference.

I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.

reply
sunir 3 hours ago
I have a similar question.

I think most software projects have reached the point that the speed of capturing real information about what the winner's circle looks like, and therefore what the program should be, so many magnitudes slower than the amount of code that can be generated in the wrong direction.

I'd need to measure these new models on well understood but complex problems that are relatively easy to validate to get a sense if they are 'better'; on the other hand, the real impact in daily life may be marginal since generating code is not the biggest problem at the moment.

reply
sermakarevich 2 hours ago
My feeling is that the reaction about new models is cooling down. At least at startups. At the beginning of the year few startup CEOs I know personally were expecting huge shifts in how companies work, headcount, efficiency, asymmetrical advantages created by ai in Q2-Q3. Now it seems like these expectation fade away. Companies don't have expertise onboard to rebuild itself to benefit from ai on a significant scale.

Fable 5 is out, metrics are better, but is your company flexible enough to benefit from it? What is your usecase?

reply
pixelatedindex 47 minutes ago
I’m sure this is banged on somewhere but I love their product branding, particularly how they have this “minor” “major” thing going on. Sonnet-Opus, and now Fable-Myth.
reply
rmuratov 27 minutes ago
I uploaded to it my 23andme DNA test results and it refused to analyze it :(a
reply
frankfrank13 2 hours ago
Not a lot of discussion on this, but there is no way to turn off data retention for this model. IME this is the first time Anthropic has released a model without allowing you to opt out.
reply
wxw 3 hours ago
I cancelled my Claude Max plan the other day. I find Claude Code incredibly slow these days compared to Codex and Cursor. I find speed matters more and more to me.

Fable 5 looks compelling. Fable, I like the word too. Anthropic definitely knows marketing.

reply
fabled-out 3 hours ago
Fable has been pretty fast for me for simple tasks--haven't tried on anything long-running yet given it's 2x usage on CC.
reply
HoyaSaxa 3 hours ago
> When Claude Fable 5 is used, Anthropic retains data, including prompts and outputs, to operate safety classifiers that detect harmful use. Other Claude models in GitHub Copilot remain covered by GitHub's existing data retention agreements

On GitHub Copilot for Business, Claude Fable 5 is only available if you are willing to let Anthropic retain your data. That in conjunction with the model being removed from plans in a couple of weeks leads me to believe that Anthropic is between training runs and using this as an opportunity to grab way more training data...

reply
bradleyg223 3 hours ago
This is a very particular use case/test, but my first prompt on a new model is always "write a solo fingerstyle guitar tab that blends ragtime, bluegrass, and gypsy jazz". This is the first model that has responded with something that isn't just a boring arpeggio of chords, so from my perspective it's off to a good start.
reply
kypro 3 hours ago
Would you mind sharing?
reply
siliconc0w 3 hours ago
Sadly, I'm getting a lot of forced downgrades to Opus for questions that are far removed from any security topic.
reply
solenoid0937 3 hours ago
the quality of discussion on HN has gone to shit, i miss when model released used to have actual informed takes from people that used them or substantive discussion about the system card
reply
weakfish 3 hours ago
From the rules [0]:

> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.

[0] https://news.ycombinator.com/newsguidelines.html

reply
javawizard 3 hours ago
They didn't say that HN is turning into Reddit, they said that the conversation quality has gone to shit.

I don't agree with that statement universally, but I have to say I do when it comes to this article. I came here hoping for substantive discussion from those who'd had a chance to try it out; instead what I got was a seemingly endless stream of venting. There's a place for venting - and plenty to vent about with the state of AI nowadays - but to borrow from the HN guidelines you linked, it does very little to gratify my personal intellectual curiosity.

reply
10xDev 3 hours ago
Nothing here is new, it is the thing we have been talking about for a while but now with guardrails.
reply
Someone1234 2 hours ago
Yeah; unfortunately what would good commentary look like? It is more of the same, but now with even higher prices, and even more limited availability. But at least it scores 5% better in whatever benchmark they've selected (*when guardrails don't misfire).

People are no longer commonly constrained by "model too dumb" limitations (in SOTA models). They're constrained by "model too expensive." So making the model ever so slightly smarter, while doubling the price, feels like a regression.

I actually think a Sonnet upgrade, while keeping the same price, would get more buzz. It addresses a wall a LOT of people, without unlimited budgets, are hitting (i.e. people feel forced to use Opus, which they cannot afford, because of Sonnet's limitations).

OpenAI recently retired Codex-5.3; which was very negatively received. Not because Codex-5.3 is superior to GPT 5.5, but because it was half the usage-cost while being "good enough." They made a better SOTA, but didn't realize that some of those customers are playing with Deepseek 4 Pro now instead of GPT 5.4/5.5 -- they were priced out.

reply
Karrot_Kream 2 hours ago
If you have nothing valuable to say, don't say it? Not writing anything is a perfectly valid option.
reply
tripleee 3 hours ago
Hate to break it to you but those "informed takes" were from people who prompted it once then made a snap judgement
reply
Karrot_Kream 3 hours ago
That is 1000x better than griping about the privacy policy, capacity issues, token costs, and how trendy the names are for the new models (???). The bar is on the floor and I just want it at my knees.
reply
Capricorn2481 2 hours ago
No it's not. The Privacy Policy is worthy of discussion. People declaring the quality of the model after 2 seconds is just noise, arguably worse than nothing.
reply
Karrot_Kream 55 minutes ago
Okay (I disagree because most privacy policy discussions on HN go in the exact same direction and turn into outrage threads but this is a reasonable disagreement since not all of these discussions do), but model naming of all things? Come on. This is low level reaction slop and it's obvious.
reply
ouk 2 hours ago
It's a shame, Fable just keeps rejecting my prompts for university biology exercise problems. It's undergraduate level, so there's nothing dangerous about it, but the classifier is very sensitive. It's unusable for me.
reply
revolvingthrow 25 minutes ago
After saying for weeks of how Mythos is in a league all of its own you’d think it was a bit more than the usual iterative few % on the benchmarks (and even more guardrails as a bonus).

IPO gonna IPO, I suppose.

reply
HAL3000 50 minutes ago
Ask Claude Code (I tried on Opus 4.8) to do this: "create a file with ISO country mappings"

API Error: Output blocked by content filtering policy

reply
theflyinghorse 2 hours ago
I've seen enough degradation of the models I pay for from Anthropic to not bite. Fable will work fine for the first couple of weeks and then start degrading like previous models did.
reply
jqdsouza 17 minutes ago
hopefully not! Anthropic did recently secure more compute...
reply
JohnMakin 3 hours ago
> There were some regressions in the model’s responses to user discussions about suicide and self-harm, and room for improvement in some areas of child safety.

Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.

reply
H501 2 hours ago
I believe that, given the rising costs, local inference of AI models will be the only viable option for many of us. I’d also like to know who will have to pay double and how long it will be financially sustainable for users to pay that amount (or even more?).
reply
gslepak 3 hours ago
> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8.

Genius way to double the price on Opus 4.8!

reply
fabled-out 2 hours ago
Anyone know how to bypass the extremely strict filter Fable 5 seems to have on health/medicine?

I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.

My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".

reply
48terry 2 hours ago
Weird how every new model seems hyped up as the most dangerous yet and the one that will destroy society as we know it. They are also a commercial product.
reply
mhrmsn 2 hours ago
Are there any details on the biology and chemistry work they did?

For example, the AAV capsid assembly looks interesting, but for one Opus 4.8 also did relatively well and there is no information what exactly they did, what protein language models they compared to and what the score even means...

reply
yesitcan 3 hours ago
> Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Wen UBI

reply
hollowturtle 2 hours ago
Never it's a fever dream and stupid shit ultra rich use to push their own agenda. You read a marketing claim, I still have my job and will continue to
reply
ravila4 2 hours ago
Fable's ridiculous. It's flagging basic biology research questions as a security risk. I'm talking basic fundamental genetics topics that make working on any genetics-adjacent codebase unusable.
reply
bobkb 3 hours ago
In an interesting coincidence I ended up watching Person of Interest S4 E5 while reading the announcement. The series showed some code supposedly belonging to to an AI.

Fable 5 said the first screen shot is from “ IDA Pro’s Hex-Rays decompiler” and a windows driver. The second screenshot triggered the safety guard rails and pushed me into Haiku.

Apparently the code is Windows driver code.

reply
bluelightning2k 3 hours ago
To hide the severity of the price increase, the plan is to move everyone right one model.

Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class

If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)

reply
pacman1337 2 hours ago
Yeah I noticed that too. For 98% of tasks I get same results with DeepSeek, it is starting to just be a branding game. It is incredible how marketing can get someone to pay 100x for same thing you can get for 1x.

This is why Claude Code just doesn't make sense to me. I need an agent that can plan using Opus and execute using DeepSeek or something else.

reply
killiancarroll 4 hours ago
A large jump in performance for double the token cost compared to Opus 4.8. Potentially worth it for planning work, likely better to offload to a less expensive model when the hard decisions are made.
reply
conradkay 3 hours ago
Looking at page 255 of the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...) it might be much better on all dimensions (speed, cost, quality) to just use Fable 5 on low/medium effort than switch to Opus
reply
firemelt 56 minutes ago
thanks for thr insights

so should we keep using workflows or not?

reply
brianmcnulty 4 hours ago
I wonder how Claude Fable will live up to expectations and how good those Fable/Mythos classifiers really are. It seems a bit convenient for Anthropic to release this magical insane model when they are about to IPO.
reply
yandie 4 hours ago
Of course it's all about building the hype for the IPO :)
reply
ThejaCH 38 minutes ago
Crazy and Scary! But its not for every one, you need to have a meaty thing for it to devourer and a deep enough pocket for it to devourer also.
reply
2001zhaozhao 3 hours ago
We'll need a lot of good summarization techniques to cut down on the cost of this model. I expect that a common use of Fable 5 is to just do high level direction while delegating literally all work (exploration and implementation) to Opus subagents.

BTW for another discount opportunity, if you reload usage credits on a claude.ai plan at $1000 increments then you get a 30% discount compared to paying API.

reply
balverineorder 3 hours ago
I have been refactoring a project using Opus 4.7/4.8 for the past few weeks or so. I just decided to switch to Fable 5 max today. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." It would not identify what the problem was. I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.

[0] https://support.claude.com/en/articles/15363606-why-claude-s...

reply
dchftcs 57 minutes ago
I suspect this will be a significant problem blocking long-horizon tasks in practice, basically the more turns there are, the larger the chance the classifier produces a false positive. The disappointment of the user will also scale with the length of the task, as you're in the middle of some complex thing and now gets derailed, after already have paid for many tokens.
reply
lkm0 4 hours ago
I'm a bit out of the loop, but do we have some grasp on the size of these closed models? Is the trick still adding an order of magnitude to weights and training data or has something changed?
reply
m_w_ 4 hours ago
I think Mythos is rumored to be ~10T parameters, so in this case I think the answer is yes, although I'm sure MoE, looped models, etc play a role in the improvements as well.
reply
balverineorder 3 hours ago
I have been refactoring a project using Opus 4.8 for the last week or so. I just decided to switch to Fable 5 max. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.

[0] https://support.claude.com/en/articles/15363606-why-claude-s...

reply
Dropoutjeep 2 hours ago
Calling it:

    1) Fable 5/Mythos introduced to free tiers with notable improvement in capabilities

    2) Other models get lobotomized without clear communication

    3.1) People call out Anthropic only to have them say "Oops!"

    3) Fable 5 gets comparatively better, but remains accessible through separate, more expensive subscription/tokens.
The current growth is unsustainable. The industry wants consumers to think it is an exponential arms race, but the reality is that we're on a treadmill: we have the illusion of sprinting forward, but only because the ground is moving backward.
reply
cedws 2 hours ago
My employer is all in on Anthropic via Enterprise (API) pricing despite it being a total scam.

Last month I pushed like <100M tokens for $800. On a personal project I pushed 600M tokens via DeepSeek V4 for $10. The pricing of SOTA models is insane but companies are still willing to light money on fire with no hard metrics proving increased productivity.

reply
pookieinc 4 hours ago
If this is as epic as it sounds, I wonder what the response will be from the other leading frontier labs / whether they even have anything to respond with at this level?
reply
ilaksh 4 hours ago
Look at the benchmarks. It's a big leap in some areas, but it's not like any of them are 60% better (if that could even make sense).
reply
Karrot_Kream 3 hours ago
Seems like Fable is doing a lot better on SWE-Bench-Pro and FrontierCode than GPT-5.5. Given how most folks I talk to and people instead online keep mentioning that GPT-5.5 was better than Opus, I'm curious what the experience now is like.
reply
skerit 2 hours ago
It's a very nice bump, but it is in no way worth all the hype of the past month.
reply
merlindru 4 hours ago
> During early testing, Stripe reported that Fable 5, [...] in a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

EDIT: I misread. This comment previously talked about 50 million lines being migrated. Instead, in a 50M LOC codebase, one specific codebase-wide migration was done.

Very impressive, but obviously not on the order of a whole-codebase migration

reply
christina97 4 hours ago
They do not claim to have migrated 50 million lines of Ruby. Simply that some migration took place in such a codebase.
reply
reddit_clone 3 hours ago
Converted all the tabs to spaces? :-)

You are right, this is not a rewrite like the Bun case.

The real news is, at 50M LOC, it is able to handle and do _something_ coherent.

reply
geodel 4 hours ago
Ok, so Stripe migrated their 50MLOC codebase from Ruby to Rust? Because that's what Bun did.
reply
timedude 2 hours ago
"Here, try our new model which falls back to the old model while eating your tokens."

Ok then...

reply
Growtika 2 hours ago
>Fable is the most capable model and takes 2× the usage of Opus

Imagine Apple announcing: 'Our most powerful iPhone yet. Battery lasts half as long.'

reply
PeterStuer 3 hours ago
If you are not seeing it under /model, do a /exit , then a Claude upgrade, then /model again and it should be there.
reply
franze 2 hours ago
is this a good time to hussle for my "AI does not need a break but you do!"* app? as quite a lot of people will propably get ai brain exhaustion maximising "playing" with that new model until they take it away again?

* https://rainbreak.franzai.com/

reply
jsw97 3 hours ago
On my very first Fable 5 prompt, got flagged on a hard but completely uncontroversial option math problem, many tokens in. Although it's pretty clear that this is an unremarkable experience at this point.
reply
pianopatrick 2 hours ago
Seems like all a bad actor has to do to gain access is to compromise one of the partner companies that has access.
reply
stronglikedan 3 hours ago
Careful using this with Cursor, especially for corp use. Anthropic will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."
reply
knollimar 4 hours ago
I swear I read a joke that "what if we named chatgpt 5.5 Fable. Could we hype it as much as mythos?" Last week!
reply
erghjunk 3 hours ago
Nice branding.

I wonder how much butterfly habitat has been/is being replaced with data centers?

reply
rs_rs_rs_rs_rs 2 hours ago
If you ask me, not enough!
reply
theLiminator 3 hours ago
> We have also added safeguards related to frontier LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks of accelerating the overall pace of AI development, though we remain uncertain about the severity of these risks. In particular, our concern is with—as we wrote then—“accelerating other AI developers in building powerful AI systems that pose similar risks to the ones ours pose - without necessarily having commensurate safeguards.” In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.

This seems pretty bullshit, you're paying through the nose for tokens and if you are doing anything ML-adjacent, you might silently get worse output without knowing it.

reply
Overpower0416 4 hours ago
I would expect a release from OpenAI soon. The battle for who can pump up their IPO the most
reply
bradley13 3 hours ago
I use AI for a wide variety of things, of which technical is only a small part - and then it's usually a problem with project configuration, not coding. Why? Because I am often testing projects handed in by students. Projects that supposedly work on their machine, but certainly do not on mine.

Anyway, anecdotally, I find Copilot shockingly awful. It makes random changes to files that have nothing to do with the problem. Call it out, and it makes other changes to other irrelevant files.

ChatGPT and Gemini are both much better. Grok also isn't bad. Claude, I honestly haven't tried yet on these issues. Perhaps I should...

reply
himata4113 3 hours ago

  > virtualization
  switching to opus 4.8
ok fair

  > embedded-allocator
  switching to opus 4.8
urgh fine

  > chrome
  switching to opus 4.8
are you kidding me?
reply
ako 2 hours ago
Tool use score is 17.4% that seems really low, what does that mean?
reply
217 4 hours ago
Oh my god it's actually here
reply
Retr0id 3 hours ago
The escalating nerfs of "cybersecurity" topics is incredibly frustrating. Opus 4.6 had boundaries that seemed reasonable to me but 4.7+ turned it into a moralizing asshole. It'd be less bad if it just gave an error message, but instead it churns a long thinking trace before writing an essay about why what you're asking is bad and wrong.

I'll be disappointed when 4.6 is retired.

reply
BenoitEssiambre 4 hours ago
Looks like a good model (sir). Costs are getting out of control though. 2x Opus and non-metered usage going away. We're quickly approaching the cost of a human salary for normal usage.
reply
vb-8448 3 hours ago
In a lot of places outside US we are already above the average cost of an average human.
reply
franze 40 minutes ago
btw in claude code

    /model claude-fable-5
reply
kypro 53 minutes ago
I just gave it a go at a problem I've been working on this week. Nothing fancy, just some inefficient code that we've been adding incremental improvements to for a while now to the point where some out-of-box thinking is probably required to push it any further – something Fable is obviously more than capable of.

After Fable did some thinking for a few minutes it gave some suggestions. A couple of them were valid – but very low impact, bordering on entirely pointless – but it's main suggestion, oh man.. It told me to make an update that would simply break the existing functionality.

So I thought about it for a moment...

Hm, I mean, I guess we could do that if we also did x, y & z to mitigate the behaviour change – maybe that's what Fable was thinking?

I replied, explaining that it would change the behaviour, assuming it would explain what it was thinking given there was clearly more to it. But no, it just said it was wrong.

This isn't some super advanced or complex code either. Had I gave this question to a senior engineer in a technical interview and they gave the answer Fable gave me I would view that very negatively. I was expecting something creative and interesting, not irrelevant + incorrect.

I'm sure it's a step up from 4.8 (although am not interested in burning the tokens to find out), but this clearly isn't as significant a change as some are implying. I'm sure if I asked it to come up with some out-of-box suggestions it could, but any competent engineer would have realised that by themselves.

reply
agnosticmantis 59 minutes ago
> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design)

Translation: we stole the entirety of human knowledge generated over millennia. You plebs though, don't you dare replicate or improve upon what we did using our product you pay for.

We know what's good for humanity and everyone else is the bad guy who can't be trusted with a tool.

reply
randomguy_12 2 hours ago
It's surprisingly sensitive to biology research topics - even reviewing standard papers on tissue culturing is flagged as a problem
reply
rfgplk 4 hours ago
If the claimed capabilities are true, Fable 5 is already at a superhuman level. We might see genuine unprecedented leaps in technology now, across all fields.
reply
gear54rus 4 hours ago
yees, any second now!

the leap here is browser extensions appearing to block all mentions of ai across the web

and that's a good thing

reply
yokoprime 3 hours ago
Probably great for those who need this. I could continue using opus 4.6 class models for the foreseeable future
reply
dangoodmanUT 3 hours ago
Not comparing to GPT Pro models is a bit strange, considering that's the natural comparison
reply
wslh 4 hours ago
I am playing with it and keeps switching to Opus [1]. The chat is a basic security review of a business project.

[1] "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more."

reply
cute_boi 34 minutes ago
Used it for simple task and I got this message.

Fable 5's safety measures flagged this message. They may flag safe, normal content as well

reply
drob518 2 hours ago
Cracks me up that a system “card” is 319 pages.
reply
maxloh 2 hours ago
I just tried using Fable 5 with Claude Code by asking it to create a LaTeX document for me.

There is no LaTeX compiler installed on my machine. It seems that Fable 5 is smart enough to download a compiler engine for me, and it kindly runs that remote binary without asking me first :)

  Support files are in place. No LaTeX compiler on PATH — let me check for a MiKTeX/TeX Live install elsewhere before writing the deck.
  
  Ran 2 commands
  
  Ran Search common LaTeX install locations
  Bash
  ls "/c/Program Files/MiKTeX"*/miktex/bin/x64 2>/dev/null | head -5; ls "/c/Users/[retracted]/AppData/Local/Programs/MiKTeX/miktex/bin/x64" 2>/dev/null | head -5; ls /c/texlive/*/bin/windows 2>/dev/null | head -5; ls /d/texlive /d/"Program Files"/MiKTeX 2>/dev/null | head; which tectonic latexmk 2>/dev/null; echo done
  done
  
  Ran Download portable Tectonic LaTeX engine
  Bash
  cd /tmp && curl -sL -o tectonic.zip "https://github.com/tectonic-typesetting/tectonic/releases/download/tectonic%400.15.0/tectonic-0.15.0-x86_64-pc-windows-msvc.zip" && unzip -o tectonic.zip && ./tectonic.exe --version
  Archive:  tectonic.zip
  Created by Cranko
    inflating: tectonic.exe            
  tectonic 0.15.0Tectonic 0.15.0
  
  Tectonic works. Now writing the bibliography and the full deck.
Opus 4.8 would just proceed without a compiler.

  Ran a command, used a tool

  Ran Check for LaTeX toolchain availability

  Bash
  cd "C:/Users/Mon/Downloads/Technical Writing" && for c in pdflatex xelatex latexmk bibtex; do printf "%s: " "$c"; command -v $c || echo "NOT FOUND"; done
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  
  Used PowerShell
  command: $cmds = @('pdflatex','xelatex','lualatex','latexmk','bibtex','miktex','tlmgr'); foreach ($c in $cmds) { $p = Get-Command $c -ErrorAction SilentlyContinue; if ($p) { "{0}: {1}" -f $c, $p.Source } else { "{0}: NOT FOUND" -f $c } }
  description: Check for LaTeX engines on Windows PATH
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  lualatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  miktex: NOT FOUND
  tlmgr: NOT FOUND

  No LaTeX engine is installed locally, so I can't produce a PDF here — I did a careful structural review against the two prior (working) decks instead. The deck is written to disk and mirrors their template exactly. Here is the deliverable.
reply
asdK120 4 hours ago
In other words, Fable is Mythos with less compute and with some feel good "safeguards".

At least they name their models honestly now to indicate that the religion has nothing to do with reality. Soon the disciples will pay the full token price to fatten their church leaders.

reply
algoth1 2 hours ago
The refusal rate is insane
reply
JustSkyfall 3 hours ago
Would be more impressive if the safeguards weren't so trigger-happy!
reply
arkwin 3 hours ago
Just wanted to comment here: I have been using Opus 4.6, 4.7, and 4.8 just fine to look for Linux kernel vulnerabilities (I'm in the cyber verification program), and it's been fine. I switched to Claude Fable 5, and now I'm getting policy violations.

What's the point of being in the cyber verification program at this point? It looks like I cannot use Fable 5 for vulnerability research.

reply
jwpapi 3 hours ago
Honestly all the recent improvements, just seem to be slower and more expensive traded for more accuracy, but the issue is that it needs to be exponentially more accurate to counter the effect of having less of a human in a loop.

Every wrong direction/mistake is more expensive and takes more time to fix. When you have small loops you can catch those mistakes faster and cheaper.

To me we are very far off from economically given long-running tasks to agents.

reply
hydra-f 4 hours ago
How much and what kind of data do you need to throw at these models to get a good design interface?
reply
taimurshasan 4 hours ago
I was on board until i saw " $50 per million output tokens" lost me bud
reply
Ninjinka 3 hours ago
gah could model naming be any more confusing?

"Claude Fable 5: a Mythos-class model"

"we're also launching Claude Mythos 5"

what is the 5? how is mythos both a model category and a model name?

reply
Sathwickp 3 hours ago
input price $10 per mil token and output price 50$ per mil token btw
reply
nevir 4 hours ago
"Fable 5 (disabled) Most capable for your hardest and longest-running tasks · Disable zero data retention to unlock Fable 5 access"
reply
logicallee 2 hours ago
What a (genuinely) surprising choice:

>"We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8"

That's a very surprising solution. Imagine being asked to do something you feel you shouldn't do, and rather than refusing, you say, "Yeah I could do that but given that I don't want you to succeed at this task, I'm going to hand this one off to my slightly less capable colleague, on the assumption that they won't actually succeed. Of course you'll still be charged for all the tokens used."

It's a very interesting choice. I think I understand the business logic correctly, but it's still surprising.

reply
dcchambers 56 minutes ago
Being unable to use this with zero data retention makes this feel like a non-starter for most enterprise customers.
reply
alvis 4 hours ago
Another thing to note: 30-day retention for all traffic on Mythos-class models

Is it good or bad? 30 days is a long time for anything bad to happen

reply
grumbelbart 2 hours ago
It's bad. I believe them not to use it for training, but t means relevant data can and will be exfiltrated by US agencies or through court orders (see NY Times vs. OpenAI, where only traffic without any rentention was safe).
reply
hugodan 2 hours ago
mankind has reached its final destination
reply
darrinm 3 hours ago
Not supported in Claude Code yet?
reply
pmuk 3 hours ago
From inside a claude code session:

/model claude-fable-5

Or start claude code with:

claude --model claude-fable-5

reply
darrinm 3 hours ago
Yeah, /model fable also worked for me (despite not being shown on the /model list). Thanks.
reply
causal 2 hours ago
One thing I find kind of annoying is how Anthropic goes for these "vast and alien" names like Fable and Mythos, but then deliberately trains the model's personality to act like a cool high school teacher that feels totally familiar.

"It's too dangerous it's a Mythos!!" directly contradicts the "I'm the cool AI you can totally trust" vibe it is trained to project.

reply
bitwize 2 hours ago
All of these AIs kind of remind me of VEGA from Doom (2016), who will cheerfully walk you, in the most friendly computer voice, through the procedure of its own destruction without even a hint of self-preservation. "First, you must destroy my cooling system. That will cause my core to overheat. Then..."

Even HAL was less unsettling because HAL sounded creepy, and had some sort of preservation instinct, if only to complete its assigned mission.

reply
firemelt 2 hours ago
so should I use it with workflows?
reply
152334H 4 hours ago
i wasn't even trying and i got flagged already...
reply
kevinalexbrown 37 minutes ago
"tell me about biology" -> "Switched to Opus 4.8"
reply
shevy-java 2 hours ago
Fable? Fabelstories? (Fablestories, but the german word seems more poignant ... Fabelgeschichten ... Fabeln)
reply
segmondy 3 hours ago
Mythos, Fable, are they trolling us?
reply
IChooseY0u 3 hours ago
Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606 ⎿ Tip: You can configure model switch behavior in /config

biology? what the heck?

reply
aykutseker 3 hours ago
who's tried it: is 2x the usage actually worth it over Opus 4.8 for daily work?
reply
pmuk 4 hours ago
Anyone got it working in claude code yet?
reply
pmuk 4 hours ago
claude --model claude-fable-5

appears to work

reply
UncleOxidant 3 hours ago
> During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How in blazes do you end up with a 50M line Ruby codebase? WTF?

reply
ieie3366 2 hours ago
Very easy. Just have a monorepo and enforce the use of a single language. The company I work in has 1m lines of TS and stripe has 50x our headcount, tracks out pretty well
reply
jckahn 4 hours ago
Cannot wait for the pelican for this one
reply
tsunamifury 2 hours ago
Clause 5 ran out of quota with TWO PROMPTS.

Lets let that sink in.

reply
throwaway2027 4 hours ago
Will try it when my limit resets.
reply
rarisma 2 hours ago
The subscription bit makes no sense has capacity appeared for these 2ish weeks out of thin air that'll vanish? why is it available now but wont be in 2ish weeks?

am i missing something?

why would I pay 200 out of pocket and then some for the best model, it seems very silly.

reply
bnchrch 4 hours ago
An 11% jump over opus 4.8 and a 22% jump over gpt 5.5 on Agentic Coding Benchmarks is certainly impressive.

Obviously still need to verify it for myself to see if it's truely a leap.

But am I the only one wondering, "What can I do today that I couldnt do yesterday?"

Previously I would think "Oh I wonder if I can finally get it to do X now?"

However now I feel like yesterdays models were more that capable to handle nearly any engineering task I paired with it on.

Maybe this is the final leap where I can comfortable set up an autonomous coding loop? Maybe.

reply
yaodub 4 hours ago
[dead]
reply
pablogancharov 4 hours ago
you can select it using /model fable in claude desktop and claude-code
reply
asdK120 3 hours ago
Is this "system card" equivalent to the stone tablets handed down to Moses? Why don't you call it "user manual"?

Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?

reply
aesthesia 3 hours ago
Because it's not a user manual? The idea of a model card originated in 2018 (see https://arxiv.org/abs/1810.03993) as a summary of important facts about a model. At the time, this was typically an image classifier or tabular ML model. Model cards became an important concept in AI governance, and they started expanding once models started getting more capable. The point of a model/system card is to document where the model came from and the evaluations that have been run, make a case that the model will be safe and reliable in its intended applications, and warn about any potential dangers from misuse. It's not an explanation of how to use the model.

OpenAI also releases system cards; here's GPT-5.5's: https://deploymentsafety.openai.com/gpt-5-5/safety

reply
redox99 3 hours ago
It used to be a "card", as in a single page or two. It doesn't make sense that they still call it that.
reply
mmis1000 2 hours ago
If calling somebody with phone is still 'dialing' someone even there is nothing round on smart phone. Then why not?
reply
apsurd 3 hours ago
The trailing snark at the end will likely get you downvoted but I'm latching on: wtf is "system card". My previous coworkers popped that in the general slack channel when Mythos first "dropped" - "have you seen the system card" without any context whatsoever. The nerds get their clique!

Also research preview pops across new upstarts in place of beta. It's eye-rolling coming from a lifelong curmudgeon.

Just talk normal!

reply
simoncion 2 hours ago
I'd call it a "whitepaper".

But most hype-dependent projects need new vocabulary for old concepts to keep people from looking too closely and maybe drawing parallels to "legacy" "unsexy" projects, so whitepapers get called "system cards" and startups get called "labs", and so on.

reply
SpicyLemonZest 2 hours ago
Couldn't someone else equally well argue that "whitepaper" and "startup" are hyped-up vocabulary for "report" and "unprofitable company"? It kinda seems to me like the cause and effect are in the other direction, and the vocabulary of a particular niche becomes cool and hype-sounding when that niche starts to pull in a lot of money.
reply
apsurd 37 minutes ago
yes at some point language evolves as the new normal, as designed.

My curmudgeon gripe with system card and research preview is really the parroting; so cant blame anthropic for what others do. It’s just… no, prediction markets for dogs doesn’t have a research preview.

reply
Sathwickp 3 hours ago
input price $10 per mil token and output price 50$ per mil token btw
reply
deafpolygon 2 hours ago
Before long, we'll be having Claude Cylon-class models.
reply
bradley13 3 hours ago
Can we please stop with the extreme "safeguards"? I don't want to waste processing power on a model deciding whether is can answer my question, or ensuring that it's answer is politically correct.
reply
beydogan 2 hours ago
my pet conspiracy theory is this is the Opus 4.5 from a few months ago which was extremely good but dumbed down after a week because it was just too good, they didn't want to release it to public. They pulled it down and deployed another "Opus", after that it was just a downhill. Opus 4.8 is unusable for me in React Native, TS, Rails development work.

Opus 4.8 gets stuck in weird loops where Codex one shots the bugs.

reply
system2 3 hours ago
I have been using FABLE 5 with Claude Code since the morning. The speed is very close to what Opus 4.5 was, and the quota use is nearly identical to what it was before the "doubling". Whatever I was experiencing 4-5 months ago is back. Maybe the model is better, but we will see. I cannot tell the difference yet.
reply
kypro 3 hours ago
Out of interest, how have you been using it since this morning? Are you in some kind of pre-release group?
reply
system2 2 hours ago
No, it was available for the last 3 hours. I am on the West Coast, so it is still morning here.
reply
charcircuit 3 hours ago
>During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

Who is refactoring by hand? This comparison is not relevant in 2026.

reply
firemelt 3 hours ago
they are like drugs dealer
reply
xeyownt 3 hours ago
Anthropic, can you please stop the FUD?

Release your best model, let the world adapt and evolve, and let's move to the next thing.

reply
__lain__ 3 hours ago
It won't even run a basic /security-review command without reverting to Opus 4.8. Utterly useless.
reply
LoganDark 3 hours ago
I actually rather like the way they have approached these safeguards. Rather than only teaching the model to refuse a request, or completely rejecting the request, the system gracefully degrades to slightly less powerful or slightly less precise operation. So you still roughly have Opus 4.8 even when safeguards trigger, but with an upgrade when they don't. As much as I hate the way they hype Mythos 5, I think the release of Fable 5 is rather nice. What's not nice though is that they plan to remove it from subscriptions soon, but getting to try it is cool, I suppose.
reply
bitpush 4 hours ago
404?
reply
Philpax 4 hours ago
Looks like they're still getting the post out, but the model is live now, and the system card is at https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3... .
reply
w4yai 4 hours ago
Pelican guy ! Where are you ? :)
reply
fabled-out 3 hours ago
This i
reply
noncoml 2 hours ago
Imagine if Google would roll this out to the search engine. We can't let you search for that because it may be used for "evil"
reply
noncoml 2 hours ago
Can't wait for some real competition so they stop trying to restrict how and why we are using the models.

Imagine if Google would tell you "we can't let you search that as you may use it for harm".

Also 2x the usage of Claude? Your limits are already ridiculously low.

reply
byteoptimizer 4 hours ago
Is Claude Fable 5 is Mythos ?
reply
ishurand4 2 hours ago
Yeah, it is also known as Claude Mythos 5
reply
briandoll 4 hours ago
New chapter
reply
tekla 4 hours ago
Maybe at this point, Fable the game will be played generated by AI as we go.
reply
jMyles 2 hours ago
> we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government

...don't like the sound of that.

Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?

This seems like a way to get somebody nuked.

reply
christkv 3 hours ago
Meh more hype for marginal improvements and from Im hearing badly calibrated guardrails causing it to stop mid operation. I guess anything to juice an IPO
reply
dominotw 4 hours ago
system card = marketing material with heavily gamed benchmarks.
reply
bitwize 3 hours ago
Cope harder. A year and a half ago, people were mocking Devin for claiming that AI could develop software at all. Yet here we are, when AI is developing most commercial software.
reply
dominotw 2 hours ago
nonsequitur
reply
bitwize 2 hours ago
The point is, even if a model or tool doesn't have advertised features today, it soon will. We're in a breathtakingly rapid cycle, and even if software engineering isn't abolished "six months from now", in 10 years the world will look vastly different for people who touch computers for a living.
reply
catigula 3 hours ago
>The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world

Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.

What's the value add here?

reply
manojkumarp 2 hours ago
[flagged]
reply
RishiByte 2 hours ago
[flagged]
reply
CoderAshton 3 hours ago
[dead]
reply
YumpiLumpus 28 minutes ago
[dead]
reply
Stevvo 4 hours ago
[dead]
reply
hmokiguess 3 hours ago
I have got it to one shot GTA 6 we can finally play it, it only took ultracode make no mistakes (/s)
reply
acentaur 4 hours ago
[dead]
reply
mugivarra69 3 hours ago
[dead]
reply
robertacion 4 hours ago
[dead]
reply
wslh 4 hours ago
It's ambiguous? Because is about Mythos specifically and Fable != Mythos.
reply
ebiester 4 hours ago
I mean, if by right you mean "insiders leaked to make a few bucks..." sure?
reply
bjord 4 hours ago
I thought they said mythos was too dangerous to make generally available?
reply
Philpax 4 hours ago
"Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.

For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program."

reply
dmix 4 hours ago
This is covered in their post…
reply
tomeraberbach 4 hours ago
"Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8."
reply
rvz 4 hours ago
You fell for their fearmongering and marketing fundraising call which was done on purpose.

Now they want to pause AI because of "recursive self improvement".

Fool me once shame on you fool me twice...

reply