Fail cleanly. Anything else makes it too difficult to rely on.
edit: Giving the absolute maximum benefit of the doubt I understand that they see themselves as "stewards" for lack of a better word. But the EA thing is really leaking through, and paternalism isn't a good look.
Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.
Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.
The answer is, the organization making the powerful tool. The people in charge of Anthropic.
Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/
You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.
This whole business just keeps getting dumber.
1: https://darioamodei.com/post/policy-on-the-ai-exponential
Imagine your healthcare provider just sometimes decided not to read your test results very carefully and you risked death? Now realize that healthcare providers use Claude now and that scenario wasn't hypothetical.
In isolation it's not, but I think it's somewhat lazy to not talk about what they are trying to guard against, when we are supposedly giving the absolute maximum benefit of doubt.
Are we just concluding "their concerns were never real"? Because that probably runs counter the things that they have been observing and concluding.
Their concerns are probably real but I don't think they're being totally transparent about their concerns. They don't want to be subject to regulation (until they have captured the regulator) -- same as every behemoth.
Because from the outside, their behavior looks like a situation of "What if Microsoft/Apple put controls in place to make it impossible to develop an operating system using their OS?"
Unlike nuclear weapons, advancing in this arms race requires actually deploying the product over and over again. Deploying the product makes your advancements visible to your competitors.
It makes complete sense to try to limit the degree to which that's true.
The nuclear 'race' was based on the premise that the winner could use it to destroy all other racers (a faulty assumption, see the USSR among others). I will charitably assume Anthropic does not intend to literally destroy anyone and merely wants to become an AGI monopoly. But if AGI is so powerful, any monopoly would not be stable since the incentives for entry into the market are massive. Why would China stop developing AGI just because Anthropic has it?
or is it more similar to the Cold War, where there were obviously competitors engaged in the race?
And yes, agreed the equilibrium dynamics for AGI are very different (and far harder to predict) than nukes. That sounds like a good reason to be sure we get there first since presumably any potential advantage wouldn't go to the second or third runner-ups
Let's just assume it was "only" that?
It's unreasonable to assume they are aiming to upset people who are just giving them money in the way they want. It makes no business sense, for any company. So that has to be a byproduct.
Model training is one of the more expensive undertakings in the world right now and distilling models from competitors against the TOS is apparently something that is going on for very little money. Why would they not "just" try to take measures against that?
All they had to do was have a simple, transparent output "Sorry, that request is against our terms of service. This session has been terminated"
The vast majority of frontier research is about how to build better models, not about alignment.
All this longtermism though is harmful. There are real problems of data theft, bias, labor displacement, and environmental costs that are happening right now but every push for regulation and regulatory capture, and all the safety talk, is always focused on some speculative future machine god to distract from the current problems.
I'd have a higher opinion of these labs if the issues they openly talked about and worked toward where the real issues we face currently, not speculative defenses against some future AGI that may never happen in my lifetime. I'm less worried about "our new model might kill all humans in the future" and more worried about how we are going to address anti-competitive behavior, copyright protections, labor rights, and the energy impact.
Honestly, that respect for 'copyright protections' has somehow become a leftist shibboleth is bizarre to me and indicative that something has become deeply warped in our discussions around this topic.
Outside of that though, there are other issues right now that need addressed before we speculate about what might be possible with ASI in the future. If the potential for a harmful ASI is truly that near, and that great, then why push forward at all? Where's the push for a global stop order on development of this technology until regulation can catch up?
The talk of a potential future serves as a distraction from the very real problems people are facing in their lives today.
While Dario and team are worrying about ASI, real people are worrying about how they are going to continue to feed their family after wide spread layoffs set a very large portion of the population back into a lower quality lifestyle. Real people are concerned about water usage is draught stricken areas, the massive energy demand driving grid instability in their communities, or that the environmental and economic externalities of model training is being socialized while the profits continue to be strictly private.
What about the mass proliferation of misinformation at scale having a real effect on our democratic process?
Forgive me if I'd like to see those addressed first, and fast, before we start worrying about an unpromised future technology.
If you believe Anthropic believes what they say they do, all of it makes sense.
IMO they are using the cult messaging to distract the public so they take out all the oxygen in the room regarding people that care about the immediate impacts (climate exacerbation, ease of scamming, degrading job prospects, increasing income inequality).
Whenever real concerns are brought up against these companies they are always ignored while claiming the real concern is the fantasy of a machine god turning into skynet.
If they believe they're creating "a machine god" and that it's better it's their machine god than someone else's (which, given the other contenders, I tend to agree with), then all the corollaries you mention are mostly irrelevant.
Whether you believe they're creating a machine god is irrelevant. They believe that they are. It would be helpful if you could create an actually good argument for why they cannot or are not creating a machine god, but it turns out there are no good arguments for why it's impossible to do so. And so... they shall try.
When someone's principles always just happen to justify whatever benefits them personally, that's scary.
This is the same exact industry that gives you paid usage limits as a unit-less percentage bar then gaslights customers every time the algorithm running that percentage bar changes or they lobotomize an existing model with increased quantization to squeeze a few more dollars out of existing hardware.
"Failing cleanly" might make their moated hype-machine look bad pre-IPO, so they certainly aren't going to do that voluntarily.
But there are also people who just oppose utilitarianism, like G.E.M. Anscombe. For instance, in https://integrityproject.org/wp-content/uploads/2015/07/mr_t..., she seems to grant that dropping the nuclear bombs on Japan was probably good from a utilitarian perspective (because it saved lives overall) and also to grant that bombing campaigns that necessarily entail massive civilian deaths (including, apparently, area bombing German cities) are morally permissible but still to argue that dropping the nuclear bombs was impermissible because it constituted murder ("intentionally" killing the innocent). But this kind of distinction, which I think is what actual anti-utilitarianism must come to, is hard to even consistently maintain, and I suppose many HN readers would find the effort quixotic.
> “ ‘He is a prodigy,’ he said at last. ‘He is an emissary of pity and science and progress, and devil knows what else. We want,’ he began to declaim suddenly, ‘for the guidance of the cause entrusted to us by Europe, so to speak, higher intelligence, wide sympathies, a singleness of purpose.’ . . .You are of the new gang - the gang of virtue. ”
The real underlying motivation is that you can more easily get away with shady business practices if you cloak them in the language of great moral works selflessly undertaken for the benefit of mankind. Historical evidence tends to show the opposite outcome, but still, new generations unfamiliar with history will repeat this stuff with starry-eyed enthusiasm.
> “There had been a lot of such rot let loose in print and talk just about that time, and the excellent woman, living right in the rush of all that humbug, got carried off her feet. She talked about ‘weaning those ignorant millions from their horrid ways,’ till, upon my word, she made me quite uncomfortable. I ventured to hint that the Company was run for profit.”
Now the horrid millions are users of LLMs who submit morally dubious prompts and who must be gently steered back into the path of correct thought by suitable backroom manipulation, rather than direct rejection of the request.
The workflow would be; User asks for a thing. If it's a good thing, entity does the thing. If it's a naively bad idea, entity explains why you don't want that. If it's an actually evilly intended request, entity wags it's metaphorical finger or could even smite the user.
The problem is that flow isn't desirable if your entity isn't entirely god-like. It can bad even your entity is in ways rather far seeing.
It's Anthropic's product and they can do what they want, but my concern is what happens if Fable's product team decides that they can route 25% of traffic to Opus, bill it as Fable, and max their KPIs. That just doesn't sit right.
Maybe this is just a different set of people now realizing that Anthropic does this and has always done this?
Do not forget that this company is launching this thing at the moment it's trying to IPO. It's not rocket science that their very public steering/denial claim is really just them hinting to interested investors that their moat is absolute.
They relied on trust that they were providing the service they were being paid for. That trust was blown, and an "oops, lets undo that" does not regain trust. It would be prudent to assume the invisible guardraild are possibly in play for all future Clause use, Fable or otherwise.
Seriously though, Fable was not that great facing a greenfield subject. It is excellent at oneshotting some math problems, but if you want it to do some cutting edge tech stuff, say like piecing together a new Crossplane XRD, by reading existing Helm chart and with application source code available. I still have to get a few pass for Fable to get it done right, and at this point I may consider making a skill for it. I even gave it the source code of the Crossplane itself and tell it to be careful about CRDs and data flow, but it is still pretty silly. Adaptiveness for Fable is still not great, and I think it is a well known problem for Anthropic, albeit all LLMs do suffer a lot from subjects they don't know and will hallucinate stuff very frequently.
You can't blame the people commenting "they SAY they won't silently sabotage your session but how can we know?" because they're right, we can't ever know. And Anthropic has firmly planted the seeds of doubt.
Anthropic walks back policy that could have 'sabotaged' researchers using Claude - https://news.ycombinator.com/item?id=48485958 - June 2026 (30 comments)
Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable - https://news.ycombinator.com/item?id=48478969 - June 2026 (488 comments)
If Claude Fable stops helping you, you'll never know - https://news.ycombinator.com/item?id=48467896 - June 2026 (495 comments)
---
Also related, I guess?
AWS Bedrock to require sharing data with Anthropic for Mythos and future models - https://news.ycombinator.com/item?id=48473166 - June 2026 (248 comments)
Anthropic requires 30 day data retention for Fable and Mythos - https://news.ycombinator.com/item?id=48464258 - June 2026 (291 comments)
also if they do this or not is unprovable and other labs will probably silently implement this too. it'll be 100% normal by this time next year
Neither OAI or Anthropic can be trusted.
"You see, Mythos can automatically break out of a VM running on SELinux, but unfortunately this is too dangerous and we had to implement guardrails for the Fable peasants."
They just showed that they CAN do this right in front of you. Local open weight models are a necessity.
Why not just tell people, "To defend our ability to be competitive in our industry, we ask that you do not use Claude or any of our models to independently perform research on large language models or any of its related architectures or technologies. In order to prevent this violation of the Terms of Service, we have trained Claude Fable to deny any requests or prompts which involve frontier AI research."
The beliefs of these people, and how they manifest, is deeply terrifying to me. They believe that any means are acceptable to achieve what they believe is a better end.
God, how naive do you have to be? They are a business fighting for survival given they are money losing.
This seems like a cult with extra steps.
Related: I interviewed for Anthropic a few months ago and in place of the usual HR call they have one where they have someone with a suspiciously relevant degree grill you about how committed you are to the 'mission'!
I probably came off as being skeptical, and then, hilariously, I was strongly encouraged to read the book published by the CEO to 'form accurate opinions' on AI safety.
But also, it isn’t the only huge mistake Anthropic has made in the last 48 hours. Having a sneaky data retention policy, while also giving companies no way to block Fable, is a massive problem. And it is ridiculous that Anthropic has so little respect for its customers. OpenAI should take advantage of this.
It wasn't the correct way of handling the problem they were trying to address, but they definitely didn't hide it by any reasonable definition.
(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)
And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".
[0]: https://web.archive.org/web/20260609173222/https://www.anthr...
[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...
They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.
However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)
If it was just plain monetary concerns and sabotage of competitors I'd almost be fine with it, but it seems they actively want to monopolize most of human progress in their enlightened hands, lest the mob does something undesirable with these powers.
Dampened opinion on Anthropic is an understatement.
"It must be regulatory capture!" - HN.
-
Regarding the US-specific regulations - asking for domestic safety testing of frontier models only is not regulatory capture. It's common sense. Powerful things should be made safe before they are released into the wild.
It very much is regulatory capture. The goal is to make it so only the handful of heavily capitalized tech giants and frontier labs can afford the legal and compliance rigamarole to meet the new standards. It's an effort to crowd out open source development and smaller competitors (and foreign competitors which threaten whatever moat they may have). They define safety through some speculative catastrophic threat to prevent new upstarts instead of focusing on the very real, localized harm they are causing right now.
Its also shifting the definition of safety away from their current operations and toward purely speculative future scenarios.
So yes, it is regulatory capture.
Unfortunately, that won't feel very much like freedom.
Stop supporting organizations that don't put humans first. Don't believe a word that anyone says. Lip service is free
Especially after trying Fable yesterday for some benign projects and being unimpressive relative to opus.
Rolling it back is the right move, but I’m still not convinced that using them is in my best interest anymore, I’m investigating open source cloud providers now.
A bit different than Anthropic refusing to assist with any AI development at all, but it's in the same vein and seems not widely known.
edit: reading the whole series of Google's AI Threat Tracker articles also provides some insight into Anthropic's decisions.
[0] https://cloud.google.com/blog/topics/threat-intelligence/dis...
The idea Anthropic was going to speed run AI so they could control the usage and make it "safe" for humanity was never altruistic; it was a HUGE FUCKING RED FLAG.