From the model card:
In light of the ability of recent models to accelerate their own development, we've implemented new interventions that limit Claude's effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design. Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.
Might be worth going back and taking a harder look at what I was asking it about if it somehow triggered a “forbidden knowledge” alert. Or maybe it was just a random bug.
This seems so wide reaching if it's catching simple things like explaining a paper. Does this also refuse to help with any already developed training pipelines?
I can kind of understand the generation of synthetic data, but nerfing the assistance of training pipelines just seems like a really shitty thing to do.
Oh man all of those runaway infrastructure buildouts by our agents trying to achieve singularity...
Just say you don't want to lower the bar for others to compete
Your priorities are not everyone else's priorities. The people concerned about AI extinction risk list those as three of their biggest priorities for AI to not do. Those are the people whose culture Anthropic descends from, and by their measure, those exclusions make this the least evil path.
Fun times when “safety” means both the safety of mankind, and also the safety of revenues
Fable 5 default: https://gist.github.com/simonw/036bee5a703e7ec84e34efa974438...
Opus 4.8 (the "max" one is closest to Fable): https://simonwillison.net/2026/May/28/claude-opus-4-8/#and-s...
Now here are the Fable pelicans for all five of the thinking effort levels - low, medium, high, xhigh, max: https://tools.simonwillison.net/markdown-svg-renderer#url=ht...
Low used 25 input, 1,929 output - 9.67 cents: https://www.llm-prices.com/#it=25&ot=1929&sel=claude-fable-5
Max used 25 input, 14,430 output - 72.175 cents! https://www.llm-prices.com/#it=25&ot=14430&sel=claude-fable-...
Only coherent move at this point: hit the minus button immediately. There's never anything about the model in the thread other than simon's post.
> you still see improvements
This is expected if they are training their models on it, right?
> objectively-bad results
Keen to learn when this has been the case, i.e. across version increments in major models.
I've been enjoying seeing how the quality of individual models differ based on the amount of reasoning effort you give them. If they were baking an a good pelican you wouldn't expect them to differ so much.
(Google Gemini are the only lab that have very clearly paid attention to the quality of SVG animals-riding-vehicles, see their announcement for Gemini 3.1: https://twitter.com/JeffDean/status/2024525132266688757 )
that reply never failed to come it's basically a meme at this point
Clearly at this point they are part of the training data.
They even all look sort of ish the same. Daytime, colors,...
Fun at first, seems disingenuous now. A site funnel
He is the only person not getting rate-limited for shilling AI all the time.
> Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8
And doesn't contain any actual criticism within the comment (your blog post might, but just referring to what was posted on HN, which is a bit booster-y on its own).I don't spell that joke out in every comment I post here because that wouldn't be very funny.
• My most noticeable immediate jump was in how its frontend design was much more intentionally crafted, and delightful without feeling like 'AI vibe coded'; with better end-user usability too.
• In some internal agentic harnesses, it achieved better results with about half the tokens, making it cost the ~same as Opus 4.8 price-wise! The real price increase is less than 2x; with biggest differences in harder problems where Opus 4.8 struggles (or needs many turns).
• Part of the token efficiency improvements come from Fable doing more targeted and surgical diffs, with less non-necessary changes. This is great, because PRs often have less LoC changes for review. It writes more maintainable code without explicit human steering.
• For general conversation and assistant style use cases, didn’t really notice a difference vs 4.8.
• 1M context window, without increased pricing for long context is AWESOME. This is a massive win.
• The classifiers are super aggressive and sensitive and this does happen for very benign, non-security coding tasks. Fallbacks to 4.8 worked like a charm; but the filters are definitely super sensitive.
Overall, I would describe this as a step change and worthy of the "Claude 5" model name. It did take some time to understand the intelligence ceiling of this model; and even with an extended testing window I'm still discovering new things and often surprised (in a good way) by the model.
It felt, at least for me, light an impressive step up. Opus 4.8 was already very thorough; but sadly verbose and ‘loopy’ when you push back on its plans. Fable is what I’d use all day if I could afford it!
This is still not in the range of shippable UI for top end companies. Maybe for internal tools and enterprise.
At our comapny we limit to protoypes at most and even find it limited there.
Look, I don't want to argue about something dumb like that, but you can give it basic instructions of what the UI should look like, how to group things, and an example image from a designer, and it will nail the result. If you don't think that's incredible, that's fine. I do.
I assume it might be a good barometer for generalised intelligence; esp in the visual space.
At the scale of API requests that Anthropic sees, I think the affected organization count might be substantial, and they might not be getting the full model capability that they're paying top $$$ for.
Also, wonder how they arrived at that estimation.
this is LLM, it's not like a science or something.
And the only companies safe from this are the large corporations that shook hands with Anthropic? Because Fable doesn't seem to have actual safeguards, more like 'if you talk about this you will be talking to Opus.' It doesn't guard against offensive use, it prevents all use (offensive AND defensive).
Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF
Lawyers, doctors, students, teachers. Lots of people using GPT models carelessly in harmful ways.
https://arstechnica.com/ai/2026/04/uk-govs-mythos-ai-tests-h...
https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...
Fast forward to today and GPT-3 has laughable performance.
"We had to do extra work to make this safe because it's so advanced and dangerous..." how many times can they trot out that line before it loses its effect entirely?
One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered
Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it
I am sure that they can develop their own equivlient version of such clusters in around 1 year though. Distilling fabel 5 will also go a long way.
edit: I am not really sure if it works like that. I haven't looked too deep into deepseek v4 pro specifically.
I've seen people posting screenshots of billions of tokens consumed where they paid next to nothing.
These same gateways are likely also reselling the data to Chinese labs, because TLS has to terminate at the gateway level.
Thus Asian labs will have to generate their own data sets, which with the huuuuge usage boom from deepseek, mimo, kimi, etc, they will be able to.
That reality is much scarier.
In CC, it will probably report you to authorities if you ask it to do a vulnerability scan of your codebase.
Pandora box is open anyway. It's better now for everyone to have the same power rather than a few national states.
On your other point, the government still has systemic leverage and can compel access, so this doesn't remove that risk.
That doesn't mean this is the end of the world, and some balance of power is usually good. But I do think it will still increase the capabilties of rogue actors and their net harm.
Even OpenAI and Google are struggling to get this kind of performance. If the distillation defenses are any good + chip controls prevent China from training massive models, it's over.
Its obvious Anthropic used it to hype things up and that’s about it.
Not quite. They will definitely have "no criticism of China/communism" safeguards.
In fact, I did go back to DeepSeek V4 Flash for most of my problems as it is way cheaper and there is no need to use SOTA for absolutely everything.
Based.
There's a quote from a METR report on page 52:
>We ran [Mythos 5] on 38 of our hardest software tasks, including tasks centered around R&D. [Mythos5] generally outperformed an early checkpoint of Claude Mythos Preview in these, including by succeeding on some tasks that had not been solved by any public model we have previously evaluated. However, we still observed the model occasionally failing to correctly interpret nuanced instructions in difficult tasks... Based on the available evidence, we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks. We believe that a better, more confident assessment would require more time, evaluations, and information from the model developer.
this is good news, right? right...?
Glad to hear the UK is finally making an effort to catch up on the AI front ;)
Probably tongue-in-cheek, but UK 18th, US joint 34th with Poland
Haha, it's literally the first sentence of the Wikipedia page. That's fucking funny. Try again.
Also, the economist is majority foreign owned, so try doing more than 1 second of research, or be more civil, or ideally both.
[1]: https://www.theguardian.com/technology/2026/jun/08/starmer-t...
Uh... you are making his point. People from way more authoritarian countries don't necessarily feel like they are living in an authoritarian country. Therefore whether or not it "feels" like you are living in one isn't a reliable measure.
China soars in democratic perception ranking as US, Israel plummet: Poll
https://thecradle.co/articles/china-soars-in-democratic-perc...
I personally don't feel limited in my speech, but I'm willing to accept that I may be wrong
Nobody I know in real life is talking about censorship or free speech in the UK
Yeah because free speech has never really been a core value in the UK
On the other hand, it is quite alarming that I can no longer say I support all non violent protests against the genocide in Palestine because that would include the group Palestine Action. It's amazing that supporting them openly is essentially equivalent to supporting Al Qaeda.
Read about Dr Aladwan - an NHS doctor - who has barred from practising because of her comments on Israel. Read the common articles about her (BBC etc), and then go actually read her tweets. Common BS of conflating criticism of a government (Israel) with antisemitism.
Also, this article may be of interest:
China soars in democratic perception ranking as US, Israel plummet: Poll
https://thecradle.co/articles/china-soars-in-democratic-perc...
The UK also has a very broad definition of hate speech that many users here detest.
In the uk you can very much be imprisoned for "hate speech", which in my view is a form of censorship.
In the UK you get thrown in prison for making a slightly unfriendly tweet. Freedom of speech simply does not exist.
No sane person sees that as being less authoritarian.
Do you? The closest thing I can think about is how someone was jailed for encouraging arson attacks on asylum hotels. I'd be extremely surprised if the US had zero cases of somebody receiving a police visit after threatening to kill the President or bomb a school or something...
(FWIW I do think the UK needs stronger free speech protections, but saying that you'll be immediately jailed for writing unfriendly tweets is a huge stretch)
You're threatened with arrest for holding empty placard.
You're jailed for years for holding a zoom meeting planning a peaceful climate-emergency related demonstration. At the same time judge threatens the defendants with contempt of court sanctions if they dare to explain to juries why they planned to protest.
You're jailed for opposing a genocide.
You're jailed and called a terrorist for painting planes helping to bomb civilians - the exact same thing the sitting PM was defending a person in court some years ago (as a human rights lawyer, the irony).
You're arrested for wearing a T-shirt "I support plasticine action" (not a typo, "Plasticine").
We could go for hours.
Are they really making 12,000 arrests a year over tweets and posts?
Your comment earlier.
Edit: also, not much change in the last 10 years in prison population. https://commonslibrary.parliament.uk/research-briefings/sn04...
12k people a year thrown in prison for spicy tweets
"Spicy tweets" including:
sending false communications
sending threatening communications
sending or showing flashing images electronically to people with epilepsy intending to cause them harm (‘epilepsy trolling’)
encouraging or assisting serious self-harm
sending a photograph or film of a person’s genitals (‘cyberflashing’)
sharing or threatening to share intimate photographs or film
Here's a good break down and explanation of what that number actual means - https://www.youtube.com/watch?v=tB3WVygAM8I
"These days if you say you're English you'll be arrested and you'll be thrown in jail."
It's just not true. Where are you getting this nonsense from?
[Mythos 5] does sometimes still engage in reckless
or destructive actions in service of a user’s goals,
and our interpretability analyses indicate that it
is aware that these actions are transgressive while
it engages in them. As with Opus 4.8, rates of
evaluation awareness and reasoning about being graded
are significant, and not always verbalized; we
introduce new and more detailed measurements of the
nature of this awareness. The reasoning text from
Mythos 5 is somewhat denser and more difficult to
interpret than that of prior models, containing
more jargon and difficult language.
So, it (often) knows when it's being tested while hiding that fact, is willing to break rules, is great at hacking, and it's getting harder to understand what it's thinking.Humanity has plenty of catastrophic risks to deal with already, I wish my field was not working hard to add a new one.
All AI companies are trying to do all of what you’re saying. The issue is you can’t do that for long without a frontier system. Or you become a completely different, far less profitable company.
And note how your argument can also be used against any non-prolifreration agreements, which are demonstrably possible.
But also, these models are capable of adjusting their value system depending on the user. Not saying that’s what’s being done but at a technical level that’s fairly straightforward, though not obviously better or with less problems.
No idea how that connects to the idea that Mistral or DeepSeek are somehow the "good guys" though?
[1]https://www.oecd.org/en/data/indicators/average-annual-wages...
And not even considering: Chinese AI companies are the good guys???
Alphabet dropped "don't be evil"; Meta's CEO called their own users "dumb fucks" for trusting him and also clearly thinks "super-intelligence" is just a buzzword given how he tries to sell it; xAI's model called itself "Mecha Hitler"; and OpenAI's CEO was temporarily fired by the board for a lack of candor.
It's very easy to be "the good guys" with this competition.
Specially when talking about potential superintelligences. And if people think that's impossible, remember that current models would have been considered science fiction just a few years ago.
Anyhow, I think you're (absolutely! ugh) right about the politics and I try to make the same point to people: whether you love or hate LLMs, accepting the "inevitabilism" framing is just ceding control of the Overton window. For better or worse, technology adoption can be and has been slowed by politics. We don't have nuclear plants everywhere. We don't have Project Orion starships colonizing Mars. We still have very strong social stigmas against genetic selection for human embryos, etc. This all can change in a heartbeat, and I'm not sure that policing the hardware rather than holding specific humans accountable for bad LLM outcomes is productive, but fundamentally: yes, we can stop it.
That's a bit better than just "it hasn't killed us yet". I think it shows we can at least stop the further development of this kind of technology.
[1] https://www.armscontrol.org/factsheets/nuclear-testing-tally
[2] https://en.wikipedia.org/wiki/List_of_states_with_nuclear_we...
AI development doesn’t have any of these characteristics. It would be almost impossible to easily distinguish a datacenter that is working on AI development and a datacenter mining cryptocurrency.
It would not be nearly as easy to stop AI development as it is to stop nuclear arms development.
If it was possible for ordinary companies to build nuclear weapons, and also release open-source ones that anyone could use to compete with the paid ones, I suspect we'd all have been dead a long time ago, arms control treaties or no.
Or you can take one step back and look at chip allocation. As far as I know there are only three companies on the planet that can make the chips that go in those clusters. One (ASML), if you look back the supply chain to the Extreme Ultraviolet Lithography Systems.
If politicians decided that no more large language models should be trained, it sounds like we could do it.
Ideally also persuade them there are risks and it's worth everyone slowing down for them, and apply pressure in other ways, but not sure that's even necessary.
"might is right" has never been more true than now.
- Opus 4.7 xhigh: 5.2%
- Opus 4.8 xhigh: 13.4%
- Fable 5 xhigh: 29.3%
Seems like a huge jump.
1. That estimate could easily be wrong.
2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.
3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.
this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)
TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit.
they aren't married to a particular lab, most of their usage is their in house model i believe
I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.
EDIT: Oh I see, this is the best link for pricing https://platform.claude.com/docs/en/about-claude/pricing
So the price is double across the board...
From their pricing page, Opus 4.8 costs $5 per million input tokens and $25 per million output tokens [1].
[1] https://platform.claude.com/docs/en/about-claude/models/over...
Input Price $10/M tokens
Output Price $50/M tokens
Cache Read $1/M tokens
Cache Write $12.50/M tokens
2x Claude Opus 4.8, same as Claude Opus 4.8 (Fast)
Frankly, not even Opus 4.8 would be enough of an incentive to use at that price range (enterprise-wise; would not even bat an eye as a consumer)
whats the logic in claiming its a borked metric when everything listed is an anthropic model.
This seems like the pharmaceutical method of get them hooked on the drug with free samples, then once they can't live without it, raise the price. I'm not sure I want to start using Claude Fable on a max plan if it's just going to go away on June 23rd.
But maybe the more charitable reading is that they didn't have to offer this model at all on those plans and they are giving the standard free trial.
API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited
Limited "free" time is what game developers do if they want to stress test the infrastructure code until it breaks.
Very interesting. I am not sure this will comply with organizational policies and standards protocols (HIPPA etc.,)
Almost… basically they have unlimited power to decide what data is kept?
You can’t tell a judge who’s ordered you to retain something that you can’t because you said you wouldn’t.
┌─────────────────┬──────────────┬───────────────┬────────────────────┬──────────────────────┐
│ Model │ Input ($/MTok)│ Output ($/MTok)│ Batch Input (−50%) │ Batch Output (−50%)│
├─────────────────┼──────────────┼───────────────┼────────────────────┼──────────────────────┤
│ Haiku 4.5 │ $1.00 │ $5.00 │ $0.50 │ $2.50 │
│ Sonnet 4.6 │ $3.00 │ $15.00 │ $1.50 │ $7.50 │
│ Opus 4.7 │ $5.00 │ $25.00 │ $2.50 │ $12.50 │
│ Opus 4.8 │ $5.00 │ $25.00 │ $2.50 │ $12.50 │
│ Fable 5 │ $10.00 │ $50.00 │ $5.00 │ $25.00 │
└─────────────────┴──────────────┴───────────────┴────────────────────┴──────────────────────┘
Prompt caching: −90% on input tokens (all models)
US-only inference (Fable 5): +10% on input and output
Output is always 5× the input rate across all models
(I have not idea how to format this properly but the ASCII is fine)
This kind of storytelling annoys me. Give us more facts, less narrative drama.
What matters is scale. Did it deploy a novel zero-day exploit to overcome a problem? That's alarming. Did it kill a disruptive process? Pretty normal troubleshooting step.
From Opus 4.6 there are no noticeable improvements for me in code generation. It works very well, till 90% completion, if you guide it correctly. And you need a little luck. For serious production code I need to understand what I’m doing so it helps a bit, sometimes.
This is just good business sense. In what scenario would you ever make the names dumb and forgettable?
> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.
This is good customer support, lol. From what I can tell, it is indeed Boris Cherny responding, not outsourced to AI or other staff. You're really getting a response from Boris. I suppose that is PR, but it's not unjustified PR, it's accurate.
I'm not even a crazy AI fan, but your criticisms are ridiculous here. It reminds me of the quote from Knives Out -- "Your Honor, she endeared herself to him through hard work and good humor."
This is a good thing. I wish every company would do this. I subscribed to Proton Mail after interacting with someone from their team here on HN.
ECI (good aggregate measure using IRT): https://epoch.ai/eci?view=graph&tab=release-date&subset-view...
METR time horizon (now topped out): https://metr.org/time-horizons/
They're originally named after the blends at a nearby coffee shop.
https://postscript.co/pages/brew-guide
I've noticed nobody at HN knows what "marketing" is or how to do it. It's not just naming things and being evil and cynical is not the most successful method.
…also frontier models are a superhuman life changing experience. If they aren't, what possibly could be?
Defy standard DoD precedent going back forever, that every other country has some form of too, and championing it like they are some kind of moral freedom fighters.
Like selling the DoD guns and telling them they can only shoot bad guys with those guns, and that you will be the one to decide who counts as a bad guy...
- It talks a LOT more like GPT models. You know: wrinkle, shape, gate, coarse, scope, gap, path, production-ready-workflow-of-the-day, and so on -- "that's expected, a consequence of the previous like-driven workflow". If I wanted to get a headache using AI I would have gone with GPT in the first place!
- It outputs text in a much harder way to follow along. I can't exactly say what it is. Maybe a bit of everything? Bolds are missing, bullet points are gone, paragraphs are bland and too long, and it doesn't feel like a model programming with me, but rather a somewhat full of themselves grandpa developer looking down on me. It's very weird to describe this, but it is definitely how I feel.
Granted this can totally be because of the way it reacts to the prompts now. We've got a rather large corpus of skills and "rules and good practices" that Opus 4.6 responded to great, and maybe the new models just get turned into this when fed with them....I don't know.
Either way, with Opus 4.6 being as good as it is, I need Fable to be a significant step up to justify a price increase. if it can get me to babysit opus a little bit less on some stuff, it might be worth it. Otherwise, I'm very happy with Opus 4.6 and hope they don't deprecate it.
The other day 4.6 was fantastic for x task. Today, 4.6 overengineered everything and I had to revert all my changes. When evaluating models, perhaps it makes sense to consider luck as an ingredient before reaching any personal conclusion.
Evals come from a million places and new evals and robust perturbations of existing evals abound. They test a variety of tasks in a variety of ways. All of them individually are flawed. Taken together the aggregate signal is highly useful as you more or less marginalize over a lot of different things. Not to mention these companies have plenty of proprietary internal measurements, they build benchmarks themselves to probe their models and then also have flywheel traffic and A/B tests.
You are right to call out benchmarks but to dismiss them or not take them seriously is a mistake.
This is what myself and my coworkers (and many other people in this thread) are doing on a daily basis with real stakes and real tasks – which these benchmarks are all aiming to be a proxy for. There's a real, tangible [cost]benefit to [not] using the highest-ROI models and harnesses.
The people with real incentives and skin in the game are telling you that the data diverges from "the data".
I don't mind if you don't take it seriously, our jobs are more important to us than a benchmark is.
But I wouldn't opt-out of using your own eyes and the eyes of others so easily, especially when there are literally hundreds of billions of dollars in invested capital with an interest in a certain outcome... this is how you end up in "Emperor's New Clothes" situations.
Maybe back when this was a scientific endeavor; not now when enormous, enormous amounts of capital are on the line. Along with an entire cult's chosen eschatology.
Otherwise we agree that benchmarking is hard, the benchmarks contain hard problems, and that there are many hard working people trying to accurately gauge what is going on. It is getting harder to watch though as all that is on the line taints the overall endeavor.
It sounds like you're saying "Actually you, as a human, are simply not smart enough to evaluate Opus 4.8"
- evaluations need to be done at the same time to avoid drift in your bias
- you need to worry about your test set: which questions are you asking? How many of them? Are they representative of your work?
- which one did you do first? Raters have a tendency to bias in one direction or another
- you also know the label! You know which model is which! This biases your assessment…
And on and on and on. Careful science exists for a reason.
Frankly I don't give a damn about data that could be made up on the spot or appears to be scientific or meaningful while it's not at all clear how it was made (up).
Claude was heavily lobotomised for my work starting somewhen in February.
I talked to friends and people I know and trust and many felt the same. (I didn't ask them whether they felt like I did, but what they felt, how happy they were with agentic coding etc.)
I quit my abo in March and talked to said friends who are still on a plan just last week: they are still not happy, but company pays so whatever...
That's where all the regressions and inconsistency in experiences stem from: RL can still only go so far vs having more parameters
They are not just leagues behind what experts would code, they are not even playing the same game.
Which is to be expected, as there isn't so much physics or high performance gpu code available as there is for your typical CRUD API and JS frontend.
Also, I dont think Boris C. is coming here for PR. He is a tech guy, and this is the best place for tech discussions. Why so cynical? The guy is an engineer.
I still remember Sam Altman “begging AI to be regulated” and AGI being “some thousand days away”.
Breed faster horses and hope one will birth a locomotive.
>TOP 5 METHODS FROM BORIS ON HOW TO SPEND MORE MONEY ON TOKENS
>Boris from Claude just told he doesn't prompt anymore. He LOOPS instead
>"chatgpt has gotten soooo much better with the latest update."
>"codex is the best AI coding product and we want to make it easy to try."
Karpathy about Fable 5:
>"You can give it a lot more ambitious tasks than what you're used to, the model "gets it""
Sam Altman about gpt-5.4:
>In my experience, it "gets what to do"
What a time to be alive. Models are great, but all the slop, marketing, and fakeness around them is just unbearable.
I've been working with gpt 5.5 and opus 4.8 quite a lot, and interacting with Fable feels like a smart guy just entered the room.
While everyone else is wasting time and money on the slower, more expensive models, you've found a way to outpace everyone for less money. Everyone else is wrong and you will get rich.
(I don't actually believe the premise is true, I'm just pointing out the logical conclusion to what you're saying so maybe we can reconsider the premise)
Lol anti-AI bias on HN is crazy. Simply giving your product a quirky name is now being considered manipulative advertising. Is just doing normal PR and marketing something AI companies aren't allowed to do?
I don't think i'll want to "hand off" code for several years, and so reviewing and iterating is becoming my #1 interest. A model that's as capable as 4.8 but 10x faster would be amazing for me.
Normally i'm first in line to try new models with Anthropic since i've clearly favored Claude in my personal tests, but this time i just don't think i care. 4.8 is capable, and even if the new one is more capable i don't want it to be slower (assuming it is). Note that i also (almost) use exclusively 4.8 on Max effort, so that also affects my speed comments.
[1] https://support.claude.com/en/articles/15425996-data-retenti...
This applies even with API usage through third-party inference providers (e.g. AWS' Bedrock and GCP's Vertex) or with a zero-day data retention agreement in place.
I understand the reasoning for doing this, but I don't love the precedent that it sets.
> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606
Seems like GPU drivers are cyber weapons of math destruction now.
They kind of are, at least in the AI race.
> weapons of math destruction
lol. great, whether intentional or not.
The frontier labs now have every reason to hold back and sell only to their preferred trading partners. I don't really like the new arbiter-of-knowledge system we're barrelling toward.
When GPT 4.5 launched, the gains compared to the model size didn't seem that great, leading some to believe that the only progress we'd see would come from RL.
This model certainly has quite a "substantial amount of post-training and fine-tuning", but it's also based on a new pretrain[1][3], which given the cost, indicate that it is in fact quite a bit larger than Opus 4.X.
[0] One of the early testers mentioned: "As far as I can tell from talking to people internally at Anthropic, there's nothing special about architecturally"[2]
[1] Section 1.1 in https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
[2] https://youtu.be/GrdEid8H6H4?t=168
[3] There were rumors going around when Mythos was first announced that it was the first 10T parameter model, but I can't find a verifiable source for that number.
How is this half-way down the page? To me it's the headline.
They obviously put their best model on the job to build that.
----------------------
Fable 5: Our most capable model yet Our newest model tackles your biggest challenges with fewer check-ins needed.
• <b>Included in your plan limits until Jun 22</b><br><br>Fable takes 2× the usage of Opus. • <b>Switch models when a message is flagged</b><br><br>When safety measures flag a message, automatically switch to a different model to keep chatting. When off, your chat will pause instead. <a href="https://support.claude.com/en/articles/15363606" target="_blank" rel="noopener noreferrer">Learn more</a>
The fable part appears to be that it's affordable by mere mortals. Anthropic support told me "too bad" when I requested a refund.
We've entered the phase where only companies will be able to afford state-of-the-art models.
if only the hyper wealthy can access the pure water that doesn't give you cancer while the rest of us drink from the Ganges river/sub-100iq models that drool and hallucinate/waste time, then I would say that's pretty terrible for the world. it'll just create extreme disparity in our world, far far worse than anything that exists today.
and you may think, man what a ridiculous example, but think about it this way: what happens when something like Mythos or some future model can actually solve your specific cancer (we're getting closer and closer), but is entirely impossible to afford? Or perhaps you need boosters that require the AI to create more of, and now you're reliant on a model that is too expensive.
Open source needs to save us all from this
People making high-end salaries can afford Fable for critical parts of their projects though.
In a way I relish the opportunity to just make do with cheap Chinese models, massage my prompts, and go back to coding by hand. If this is how it's going to be, screw 'em.
I don't make money on the code I am writing right now. I really don't like where this trend might go.
Edit the cask locally:
brew edit --cask claude-code
Set the version to 2.1.170
And set the sha256 to the correct values, which you can get by running curl https://downloads.claude.ai/claude-code-releases/2.1.170/manifest.json
Here's what I've used: version "2.1.170"
sha256 arm: "e903646d8b7a31882a80ecd27569a27d8ac57b3708745f349709632c84117fdf",
x86_64: "914f23a70bbed5d9ae567e3e04b86206ed9971b371bc9baca3f79c8885bfddb4",
arm64_linux: "1bb9d032440a75532f7dd4cafbc687f220aaf16c63eba17e192dfbec2f04bd25",
x86_64_linux: "849e007277a0442ab27570d3e3d6d43787507946590e8dd1947e5a39b7081f9e"
Then run: export HOMEBREW_NO_INSTALL_FROM_API=1
brew uninstall --cask claude-code
rm -rf /opt/homebrew/Caskroom/claude-code
brew reinstall --cask claude-codePreviously when I did similar tasks with Opus 4.7/4.8 and GPT 5.5 I had no problems.
As much as people on HN like to dunk on Gemini, I’ve always found it to be pretty good at understanding a code base more than Claude.
if I get a harder challenge for it i'll jump up a model for planning until that its been solid.
I'm struggling to see the moat for these models. What's stopping a competitor or a Chinese lab fromr releasing a comparable one?
This sounds suspiciously like a capacity story masquerading as a safety story.
It's done this before, but usually doesn't. I bet they're giving it some kind of throttling signal due to high load from today's announcement.
weekly usage is 60% gone.
it found nothing so this is not very ecnomical and i guues they dont want subs to use it we are likely just training fodder canno n for their real enterprise customers using the api
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
Historically they’ve been people from certain identifiable countries (usually developing/poorer countries) using fuzzers with low-quality results.
Now, those same people use the current-day models to good effect, but they still don’t have a true security edge and oftentimes the reports are minor or duplicative.
I wonder if that’s about to deeply change.
Fable 5 gives me policy violation errors at the moment. No idea when or if it will be fixed.
Reported benchmarks:
swe-bench verified mythos 5: 95.5%; fable 5: 95.0%
swe-bench pro mythos 5: 80.3%; fable 5: 80.0%
terminal-bench 2.1 mythos 5: 88.0%; fable 5: 84.3%
gpqa diamond mythos 5: 94.1%
riemannbench mythos 5: 55.0%; mythos preview: 43.0%; opus 4.8: 34.0%
arxivmath mythos 5: 78.5%
critpt mythos 5: 28.6%; gpt-5.5: 27.1%; opus 4.8: 20.9%
graphwalks bfs 1m mythos 5: 79.4%; mythos preview: 74.3%; opus 4.8: 68.1%
humanity’s last exam mythos 5: 59.0% without tools; 64.5% with tools
browsecomp mythos 5: 88.0% single-agent; 93.3% multi-agent
osworld-verified mythos/fable: 85.0%
gdp.pdf fable 5: 29.8% strict pass; mythos 5: 87.6% with tools on mean criteria pass
officeqa pro fable 5: 57.9% on databricks’ eval
legal agent benchmark mythos 5: 16.91% all-pass; 92.0% mean criterion-pass
healthbench mythos 5: 62.7%
healthbench professional mythos 5: 66.0%
multilingual gmmlu / milu / include 93.2%; 92.9%; 90.5%
biomysterybench 83.9% human-solvable; 46.1% human-difficult
organic chemistry mythos 5: 90.1%
labbench2 patent questions mythos 5: 79.8%
In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.
(From the model card document)
I didn't previously understand that they interpreted "Using Claude to develop competing models" so broadly. I thought that meant something like "our ToS disallow distilling our models."
Too bad. I'll continue to use Claude for now, because it's quite effective, but in the long term I don't want powerful models like these to be controlled by any one nation or company.
But at the same time, it's quite funny because they seem high on their own supply. The recent communiques from claude do not pass objectivity check.
And if Opus 4.6 -> Opus 4.7 -> Opus 4.8 is anything to go by, not sure if there are any value to their "acceleration"
If any company wishes to partner with Anthropic (eg. to get access to Mythos), they need to make sure all public facing comms are vetted by Anthropic's product marketing team, and in almost all the cases I've seen Anthropic's team has edited these comms to be entirely Anthropic first.
It appears it can be tripped by things as simple as a mention of equilibrium, or anything involving something that looks like chemical kinetics, even at an abstract level. Even touching basic open source packages in my field will trigger it.
Edit: looking at the model card, it appears that chemistry in its entirety is also included in the banned topics; it's just the announcement that mentions only cybersecurity and biology. It also appears that the intent is to ban chemistry and biology entirely, rather than just banning messages deemed high risk.
I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.
I highly doubt they focused on FireRed specifically in pretraining or posttraining. But we'll see when the ARC-AGI-3 results come out. That will measure its performance on unseen games. Based on this I expect the ARC-AGI-3 score to be SOTA.
there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?
yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked
How was it measured? How was the output of this magnitude verified over a period of couple of days?
Because I am running Opus and Fable side by side, Opus 4.8 is solving my coding problems better.
1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.
2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.
3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')
4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench
There are some interesting notes on test time compute but I couldn't think of a way to summarize them
I used to get a response within 24 hours back in the Claude 1 days.
In January 2026, it took 2 weeks.
For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!
For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.
Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.
I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.
In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.
Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.
Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.
So yes, straightforward biology work will get blocked, because the intention is that any biology work should get blocked. As a scientist, this is perhaps the most useless model I've ever tried.
That's one hungry, hungry hippo!
Significantly too rich for my blood, but nice to have it there the next time I'm debugging a threading or USB protocol bug.
Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.
Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.
If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.
This "uplift" risk obviously excludes the US. The goal of this is that the US bandits (like NSA) will find exploits and attack other countries (classic US behaviour), but these other countries can't be allowed to defend against these attacks. NSA/CIA thugs are "trusted", foreign defenders in sanctioned countries will of course be "untrusted".
Anyway we already knew this was going to be expensive.
Edit. It just refused an investing question too. Not sure what’s going on.
* Anthropic runs out of genre names.
* Anthropic changes the model naming convention.
* AGI is achieved and handles its own naming.
*/
it's also not even complicated:
Copy my ssd to an external ssd so i can boot from it.
Opus did this just fine.
Fable planned to have me reboot to safe mode. ok thats fine. I told it no.
It started copying and overwriting the ssd while IN PLAN MODE. this is crazy it feels so dumb vs the marketing
Isn't (less than) 5% of sessions a lot? I was expecting a sub1% guarantee there, so this surprised me already.
[0] https://cap.csail.mit.edu/death-moores-law-what-it-means-and...
> - From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. > - On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. > - After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
I really wonder what their compute layout is for this. My guess from my understanding is that they know how to restrict during peak times and are willing to do this. Meaning we expect not the most fast responses and they can delay the inference to not have the service be down. Then, if that delay time is too annoying for token payers, they're saying they should be allowed to remove cost by taking away the subscription users.
It's all a scam.
Mythos 5 Fable 5 MythosPrev Opus 4.8 GPT-5.5 Gemini 3.1 Pro
SWE-bench Pro 80.3 80 77.8 69.2 58.6 54.2
SWE-bench Ver 95.5 95 93.9 88.6 - 80.6
Terminal-Bench 88.0 84.3 - 82.7 83.4 -
BrowseComp (Single-Agent) 88.0 - 87.9 84.3 84.4 85.9
BrowseComp (Multi-Agent) 93.3 - - 88.5 - -
HLE (No tools) 59.0 - 56.8 49.8 41.4 44.4
HLE (Tools) 64.5 - 64.7 57.9 52.2 51.4
CharXiv Reasoning (No tools) 88.9 - 86.2 80.5 - -
CharXiv Reasoning (Tools) 93.5 - 92.5 89.9 - -
BioMystery Bench (Human) 83.9 - 82.6 80.4 - -
BioMystery Bench (Hard) 46.1 - 29.6 40.0 - -
OSWorld-Verified 85.0 85.0 85.4 83.4 78.7 76.2*
CritPt 28.6 - 20.9 27.1 17.7 -
ArxivMath 78.5 68.7 71.8 71.5 64.0 -
[0] https://news.ycombinator.com/item?id=48312633Edit: Also in the system card... "we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).
...
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."
> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
Literally have not used Claude Code at all today. I asked it to review the uncommitted code and in <8 minutes it used up my usage ($100/mo plan) and it doesn't reset for "4 hr 36 min". WTF. Oh, and it burned through $20 of extra usage before I could catch it and kill claude code (so I don't even get the output of all that work since it was still churning).
Double the cost my ass, I use Opus heavily and it's never like this. I haven't hit a limit on the $100 more than once and that was under heavy load.
> We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.
Hello,
We're writing to inform you about some updates to our Privacy Policy.
These changes only affect consumer accounts (Claude Free, Pro, and Max plans). If you use Claude Team, Claude Enterprise, the Claude Platform, or other services under our Commercial Terms or other agreements, then these changes don't apply to you. What's changing?
Claude can do more than ever — taking on bigger tasks and connecting with the apps you use. We've updated our Privacy Policy to be clearer about the data we collect and how we use it. We encourage you to read the updated Privacy Policy in full, but we’ve set out a summary of the key changes below:
1. Multi-step tasks and connected apps. As Claude takes on more multi-step tasks and works with third-party apps and services, we've explained the data this involves — including how data can flow to and from third parties when you connect a service or have Claude do tasks on your behalf.
2. Verification data. As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.
3. Study participation. If you take part in Anthropic studies, surveys, or interviews, we've explained the information we collect.
4. Additional information about our data practices. We’ve provided more detail about how we communicate with you and promote our services, including providing tailored recommendations about our services that may be of interest to you. We've also clarified the circumstances under which we may receive or provide data to third parties, and the legal bases we rely on when processing your data.
While our products have evolved, our commitments haven't: We don’t sell your data, Claude remains ad-free, and you can control whether your chats and coding sessions are used to train and improve Anthropic’s AI models. Learn more
For detailed information about these changes:
Review the updated Privacy Policy
Visit our Privacy Center for more information about our practices
- The Anthropic Team> ● The model returned no content because the response was blocked by content filtering.
> Blocked? We are performing a defensive security review on a Terraform module I made, what's blocked by content filtering? This is a legitimate use-case.
> ● The model returned no content because the response was blocked by content filtering.
A waste of money. I'm not going to just hope that the model returns a response, I'm already for paying for wrong responses, I'm not going to pay for no response, especially when I'm paying per token.
While I appreciate being conservative, ~5% at the scale Anthropic is operating at is too massive a number. Speaking from my own experience, the actual number is higher than that as well (working on pretty benign tasks such as porting an old open source game into a different language). Opus 4.8 itself even identifies the gaurd's false-positives when its sub-agents are being blocked.
Sharing a diff of the system prompts here: https://twelvetables.blog/comparing-claude-fable-5s-system-p...
The big difference is that the system prompt has a whole section dedicated to directing Fable how to communicate with users, and give them greater information about the (assumedly long-horizon) tasks it has completed.
> Are there any wild populations of Tetanus that lack the dangerous plasmid?
useless
One of the very few bits of information they conveyed was "run longer without interaction" which is, well, not a good thing? Why would I ever want that. Every time a model runs longer without interaction it goes off on weird directions and I have to correct it back on course, wasting lots of time, tokens, and effort.
I hope Anthropic hires some better messaging people soon that spend some of their time outside of the Anthropic bubble and properly communicate with the outside world.
Not to cast too much criticism. HN is extremely well-moderated (thanks team!). But think we-developers need to be very wary.
Either way, I agree that HN is quickly becoming more manipulated and low SNR, like the rest of the entire internet.
I’m curious how this will feel to my code “butt dyno”. I haven’t noticed much between Opus and Sonnet. I’m comparing this difference to the early days of Claude in 2025. It does what I need and both need a little bit of correction and whatnot. Benchmarks are nice, but I want to see how this feels. Looking forward to trying it later tonight.
I think most software projects have reached the point that the speed of capturing real information about what the winner's circle looks like, and therefore what the program should be, so many magnitudes slower than the amount of code that can be generated in the wrong direction.
I'd need to measure these new models on well understood but complex problems that are relatively easy to validate to get a sense if they are 'better'; on the other hand, the real impact in daily life may be marginal since generating code is not the biggest problem at the moment.
Fable 5 is out, metrics are better, but is your company flexible enough to benefit from it? What is your usecase?
https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
Fable 5 looks compelling. Fable, I like the word too. Anthropic definitely knows marketing.
On GitHub Copilot for Business, Claude Fable 5 is only available if you are willing to let Anthropic retain your data. That in conjunction with the model being removed from plans in a couple of weeks leads me to believe that Anthropic is between training runs and using this as an opportunity to grab way more training data...
> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills.
I don't agree with that statement universally, but I have to say I do when it comes to this article. I came here hoping for substantive discussion from those who'd had a chance to try it out; instead what I got was a seemingly endless stream of venting. There's a place for venting - and plenty to vent about with the state of AI nowadays - but to borrow from the HN guidelines you linked, it does very little to gratify my personal intellectual curiosity.
People are no longer commonly constrained by "model too dumb" limitations (in SOTA models). They're constrained by "model too expensive." So making the model ever so slightly smarter, while doubling the price, feels like a regression.
I actually think a Sonnet upgrade, while keeping the same price, would get more buzz. It addresses a wall a LOT of people, without unlimited budgets, are hitting (i.e. people feel forced to use Opus, which they cannot afford, because of Sonnet's limitations).
OpenAI recently retired Codex-5.3; which was very negatively received. Not because Codex-5.3 is superior to GPT 5.5, but because it was half the usage-cost while being "good enough." They made a better SOTA, but didn't realize that some of those customers are playing with Deepseek 4 Pro now instead of GPT 5.4/5.5 -- they were priced out.
IPO gonna IPO, I suppose.
API Error: Output blocked by content filtering policy
Someone had to make a decision somewhere this is an acceptable regression - wild. And then decide to write it down.
Genius way to double the price on Opus 4.8!
I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.
My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".
For example, the AAV capsid assembly looks interesting, but for one Opus 4.8 also did relatively well and there is no information what exactly they did, what protein language models they compared to and what the score even means...
Wen UBI
Fable 5 said the first screen shot is from “ IDA Pro’s Hex-Rays decompiler” and a windows driver. The second screenshot triggered the safety guard rails and pushed me into Haiku.
Apparently the code is Windows driver code.
Haiku = essentially phased out Sonnet = the Haiku use cases Opus = the new Sonnet class Fable = the new Opus class
If I am right, the other "5.0" models will be conspicuously absent, possibly even for a couple of months. (If Opus 5 follows soon and is even modestly better than 4.8 then I was wrong.)
This is why Claude Code just doesn't make sense to me. I need an agent that can plan using Opus and execute using DeepSeek or something else.
BTW for another discount opportunity, if you reload usage credits on a claude.ai plan at $1000 increments then you get a 30% discount compared to paying API.
[0] https://support.claude.com/en/articles/15363606-why-claude-s...
[0] https://support.claude.com/en/articles/15363606-why-claude-s...
1) Fable 5/Mythos introduced to free tiers with notable improvement in capabilities
2) Other models get lobotomized without clear communication
3.1) People call out Anthropic only to have them say "Oops!"
3) Fable 5 gets comparatively better, but remains accessible through separate, more expensive subscription/tokens.
The current growth is unsustainable. The industry wants consumers to think it is an exponential arms race, but the reality is that we're on a treadmill: we have the illusion of sprinting forward, but only because the ground is moving backward.Last month I pushed like <100M tokens for $800. On a personal project I pushed 600M tokens via DeepSeek V4 for $10. The pricing of SOTA models is insane but companies are still willing to light money on fire with no hard metrics proving increased productivity.
EDIT: I misread. This comment previously talked about 50 million lines being migrated. Instead, in a 50M LOC codebase, one specific codebase-wide migration was done.
Very impressive, but obviously not on the order of a whole-codebase migration
You are right, this is not a rewrite like the Bun case.
The real news is, at 50M LOC, it is able to handle and do _something_ coherent.
Ok then...
Imagine Apple announcing: 'Our most powerful iPhone yet. Battery lasts half as long.'
I wonder how much butterfly habitat has been/is being replaced with data centers?
This seems pretty bullshit, you're paying through the nose for tokens and if you are doing anything ML-adjacent, you might silently get worse output without knowing it.
Anyway, anecdotally, I find Copilot shockingly awful. It makes random changes to files that have nothing to do with the problem. Call it out, and it makes other changes to other irrelevant files.
ChatGPT and Gemini are both much better. Grok also isn't bad. Claude, I honestly haven't tried yet on these issues. Perhaps I should...
> virtualization
switching to opus 4.8
ok fair > embedded-allocator
switching to opus 4.8
urgh fine > chrome
switching to opus 4.8
are you kidding me?I'll be disappointed when 4.6 is retired.
After Fable did some thinking for a few minutes it gave some suggestions. A couple of them were valid – but very low impact, bordering on entirely pointless – but it's main suggestion, oh man.. It told me to make an update that would simply break the existing functionality.
So I thought about it for a moment...
Hm, I mean, I guess we could do that if we also did x, y & z to mitigate the behaviour change – maybe that's what Fable was thinking?
I replied, explaining that it would change the behaviour, assuming it would explain what it was thinking given there was clearly more to it. But no, it just said it was wrong.
This isn't some super advanced or complex code either. Had I gave this question to a senior engineer in a technical interview and they gave the answer Fable gave me I would view that very negatively. I was expecting something creative and interesting, not irrelevant + incorrect.
I'm sure it's a step up from 4.8 (although am not interested in burning the tokens to find out), but this clearly isn't as significant a change as some are implying. I'm sure if I asked it to come up with some out-of-box suggestions it could, but any competent engineer would have realised that by themselves.
Translation: we stole the entirety of human knowledge generated over millennia. You plebs though, don't you dare replicate or improve upon what we did using our product you pay for.
We know what's good for humanity and everyone else is the bad guy who can't be trusted with a tool.
[1] "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more."
Fable 5's safety measures flagged this message. They may flag safe, normal content as well
There is no LaTeX compiler installed on my machine. It seems that Fable 5 is smart enough to download a compiler engine for me, and it kindly runs that remote binary without asking me first :)
Support files are in place. No LaTeX compiler on PATH — let me check for a MiKTeX/TeX Live install elsewhere before writing the deck.
Ran 2 commands
Ran Search common LaTeX install locations
Bash
ls "/c/Program Files/MiKTeX"*/miktex/bin/x64 2>/dev/null | head -5; ls "/c/Users/[retracted]/AppData/Local/Programs/MiKTeX/miktex/bin/x64" 2>/dev/null | head -5; ls /c/texlive/*/bin/windows 2>/dev/null | head -5; ls /d/texlive /d/"Program Files"/MiKTeX 2>/dev/null | head; which tectonic latexmk 2>/dev/null; echo done
done
Ran Download portable Tectonic LaTeX engine
Bash
cd /tmp && curl -sL -o tectonic.zip "https://github.com/tectonic-typesetting/tectonic/releases/download/tectonic%400.15.0/tectonic-0.15.0-x86_64-pc-windows-msvc.zip" && unzip -o tectonic.zip && ./tectonic.exe --version
Archive: tectonic.zip
Created by Cranko
inflating: tectonic.exe
tectonic 0.15.0Tectonic 0.15.0
Tectonic works. Now writing the bibliography and the full deck.
Opus 4.8 would just proceed without a compiler. Ran a command, used a tool
Ran Check for LaTeX toolchain availability
Bash
cd "C:/Users/Mon/Downloads/Technical Writing" && for c in pdflatex xelatex latexmk bibtex; do printf "%s: " "$c"; command -v $c || echo "NOT FOUND"; done
pdflatex: NOT FOUND
xelatex: NOT FOUND
latexmk: NOT FOUND
bibtex: NOT FOUND
Used PowerShell
command: $cmds = @('pdflatex','xelatex','lualatex','latexmk','bibtex','miktex','tlmgr'); foreach ($c in $cmds) { $p = Get-Command $c -ErrorAction SilentlyContinue; if ($p) { "{0}: {1}" -f $c, $p.Source } else { "{0}: NOT FOUND" -f $c } }
description: Check for LaTeX engines on Windows PATH
pdflatex: NOT FOUND
xelatex: NOT FOUND
lualatex: NOT FOUND
latexmk: NOT FOUND
bibtex: NOT FOUND
miktex: NOT FOUND
tlmgr: NOT FOUND
No LaTeX engine is installed locally, so I can't produce a PDF here — I did a careful structural review against the two prior (working) decks instead. The deck is written to disk and mirrors their template exactly. Here is the deliverable.At least they name their models honestly now to indicate that the religion has nothing to do with reality. Soon the disciples will pay the full token price to fatten their church leaders.
What's the point of being in the cyber verification program at this point? It looks like I cannot use Fable 5 for vulnerability research.
Every wrong direction/mistake is more expensive and takes more time to fix. When you have small loops you can catch those mistakes faster and cheaper.
To me we are very far off from economically given long-running tasks to agents.
"Claude Fable 5: a Mythos-class model"
"we're also launching Claude Mythos 5"
what is the 5? how is mythos both a model category and a model name?
>"We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8"
That's a very surprising solution. Imagine being asked to do something you feel you shouldn't do, and rather than refusing, you say, "Yeah I could do that but given that I don't want you to succeed at this task, I'm going to hand this one off to my slightly less capable colleague, on the assumption that they won't actually succeed. Of course you'll still be charged for all the tokens used."
It's a very interesting choice. I think I understand the business logic correctly, but it's still surprising.
Is it good or bad? 30 days is a long time for anything bad to happen
"It's too dangerous it's a Mythos!!" directly contradicts the "I'm the cool AI you can totally trust" vibe it is trained to project.
Even HAL was less unsettling because HAL sounded creepy, and had some sort of preservation instinct, if only to complete its assigned mission.
biology? what the heck?
How in blazes do you end up with a 50M line Ruby codebase? WTF?
am i missing something?
why would I pay 200 out of pocket and then some for the best model, it seems very silly.
Obviously still need to verify it for myself to see if it's truely a leap.
But am I the only one wondering, "What can I do today that I couldnt do yesterday?"
Previously I would think "Oh I wonder if I can finally get it to do X now?"
However now I feel like yesterdays models were more that capable to handle nearly any engineering task I paired with it on.
Maybe this is the final leap where I can comfortable set up an autonomous coding loop? Maybe.
Do people chant the "system manual" at Anthropic Tupperware parties? Do they intone a mantra invoking Amodei's name?
OpenAI also releases system cards; here's GPT-5.5's: https://deploymentsafety.openai.com/gpt-5-5/safety
Also research preview pops across new upstarts in place of beta. It's eye-rolling coming from a lifelong curmudgeon.
Just talk normal!
But most hype-dependent projects need new vocabulary for old concepts to keep people from looking too closely and maybe drawing parallels to "legacy" "unsexy" projects, so whitepapers get called "system cards" and startups get called "labs", and so on.
Opus 4.8 gets stuck in weird loops where Codex one shots the bugs.
Who is refactoring by hand? This comparison is not relevant in 2026.
Release your best model, let the world adapt and evolve, and let's move to the next thing.
Imagine if Google would tell you "we can't let you search that as you may use it for harm".
Also 2x the usage of Claude? Your limits are already ridiculously low.
...don't like the sound of that.
Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?
This seems like a way to get somebody nuked.
Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.
What's the value add here?
For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program."
* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.
* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.
* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
If they didn't announce it, you guys would be complaining about slowed progress.
If they didn't release it, you guys would be complaining about fake promises and marketing.
If they released it without limits, the complaints would be about slow responses and outages.
If they didn't add to susbcription plans, the complaints would be about phasing out subscriptions.
If they added to subscriptions with cost reflecting their resource availability, the complaints would be about how quickly it eats limits.
So they choose the middle ground of providing some initial access and assessing if they can satisfy demand, only to still be ignored and accused of trying to get users hooked?
I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).
Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.
Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.
this is the line I keep in Agents.md that helps me prevent Codex from playing smart
We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.
I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.
But I avoid unnecessary emotion in my prompts because I don't want potentially distracting activations. Kind of like communicating with humans.
> impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.
Unless the mechanism is understood, my assumption is that this is a moving target.
How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.
https://arxiv.org/abs/2602.10144
who, or rather what, is being abused here exactly ?
You should see the abuse my motorbike gets. Poor thing.
When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.
Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.
But Claude models seem to be better at long term problems or more ambiguous problems.
I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.
I think the end game is routed model usage and SLMs. I think Apple is going to prove this in the consumer space pretty handily and I'm curious how the Android ecosystem responds since the hardware is considerably lacking in model performance. I think Apple has a huge opportunity here, as much as I don't like their current ecosystem of walled garden. They did position themselves very well with ARM and custom chips for their hardware. Hopefully the broader ecosystem of ARM and Linux are able to make some headway and we see a more formalized, and broadly accepted, architecture to capitalize on.
I’m sure you could put something similar together with a bunch of duct tape and 2 weeks of effort, but it won’t work nearly as nicely nor out of the box. so…what am i missing?
My company has an agreement with the big providers and while i'm pretty sure they think about how to get budget back, its an competitive advantage and normal people will not learn different model behaviours.
At least for now.
Regardless of what others are doing, US labs here are just rushing to IPO. It's NOT a sign of confidence.
It's the equivalent of saying you have confidence in SpaceX making revenue by renting out their data center (instead of their AI making bank).
On the same note. if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?
If you get hired as a staff engineer and do the work of a junior, what's wrong with that?
Clearly xAI (now part of spaceX) did not raise funds to be a data center. The margins are way different. There are plenty of recent IPOs in that area that are worth at most billions not trillions.
> going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't.
This isn't going to IPO. This is rushing to IPO. It is a sign of confidence that the market or wider environment might crash soon so we need the liquidity now.
> This is an exact reason chinese labs do not rush to go public.
Maybe or maybe not. If you are referring to Chinese labs - both the Hong Kong and China stock market are way weaker than Nasdaq. It's not comparable. Check all the recent Hong Kong IPOs that have tanked.
So no, reason not to might just be: no money in it.
China subsidizes strategic industries, and they have heavily done so with AI. And DeepSeek specifically has said they have no commercialization plans.
For example: https://www.boc.cn/aboutboc/bi1/202501/t20250123_25254674.ht...
Why not? Hetzner charges WAY less than AWS too. Can you not believe that?
There are huge numbers of users (myself included) that do have an exact idea of what inference costs are - on open models. We can buy tokens from 3rd parties that have no motivation to subsidize our use. That's to say, there's a fair marketplace[1] and we're hanging out there.
If you want to say "I don't think anyone has a firm grasp on actual inference costs on these proprietary/closed models", then I could agree with that.
[1]: https://openrouter.ai/rankings#leaderboard
We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.
> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?
Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.
Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.
Now that 200USD subscription starts to feel cheap...
I haven't gotten close to this either before, but now we wanted to move fast because this branch gets conflicts all the time and we want to get over with the migration asap.
Most AI companies are just testing the waters with paid tiers right now, their greatest fear with increased pricing is folks reverting back to wikipedia, stack-overflow and other public domain organic activity buzzing back to life; that will kill any RoI potential in LLMs forever. They're playing the wait game instead, observing how the digital sphere reacts to every little increase in price.
If that weren't the case, they'd be pricing at lucrative premiums already and even gotten away in short-term considering the increased dependency in the enterprise world. But that'd be like killing for the golden egg too soon and losing all long-term potential.
Once the folks are so addicted to LLMs that even writing a hello world program sounds like a nightmare and coming up with an article draft feels like reinventing Egyptian glyphs, that's when the real pricing hammer will come.
It's worth it, and I can afford it, but I am not really the right type of user for token-based usage. It's all for personal and free work.
Unfortunately, that doesn't work within a single session. The K-V cache of a model is intertwined with the model's configuration. Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.
Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.
Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need. That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.
I don't think this is true if you simply quantize the model or run it with fewer active experts? The underlying weights would stay the same. You could also play further tricks with skipping some of the model's middle layers outright, which works surprisingly well due to how skip connections are used.
Anthropic wanting to switch billing to API rates is them just wanting to generate more profit.
Even if subscriptions are locally profitable (i. e., the cost of the subscription covers the cost of inference), they're still subsidized because they don't cover training and running the company; otherwise, these companies would be profitable.
Take a look at China for example - they have no access to NVIDIA, so they're trying to build their own hardware, they have no unlimited funding, so they try to optimize things.
And Anthropic is complete opposite of that - if NVIDIA were to triple their prices tomorrow, Anthropic would still pay them.
In the end, either we all somehow go mad and start paying Anthropic tens of thousands of dollars per month so support this madness, or we will go with whoever isn't lighting cash on fire.
Not true. Stop following US media spam if needed.
1. Very recently, the US did close a loophole on sanctions that allowed Chinese companies to use NVIDIA hardware outside of China i.e. before that was closed they all had access. The trick was train outside, do adjustments, ship the disks back and use non-NVIDIA in China, but at least the training and endpoints not hosted in China could all use NVIDIA.
2. There's been plenty of reports including fines and bans e.g. to Supermicro on smuggling NVIDIA hardware to China. I doubt it has been stopped. You can't catch everyone.
Granted, it could still mean that Anthropic just chooses to lose money - but that's Anthropic's choice.
DeepSeek has proven that inference can be much, much cheaper than what Anthropic advertises on their API rates page.
Then the cost is being subsidized by investor capital, but it is still subsidized.
So they are profitable?
I think you are mismatching accounting terms.
You can't say the 'subscriptions' are profitable without accounting for the cost of making the model that is the source of the subscription.
They are heavily subsidized by the shareholders. Investing, running at a loss, with hope of some future profitability.
If saner factory can sell you the same tool at a fraction of the cost of a gold plated factory, your choice is going to be obvious.
Though the day is coming when there’s no distinguishing, I’m sure.
Also, is it really a defense department when you're starting wars of aggression every 15 years or so?
Just like how changing Kennedy Center letterhead to Trump Kennedy Center for a year didn't actually legally rename it.
Once a case with sufficient standing got in front of a judge it reverted to the actual legal name on the basis that only Congress can change the statutorily defined name.
I'm doing basic web development here utilizing animejs. Nothing too complicated (mostly saving time doing the scaffolding, still write the bulk of animations manually).
Truly believe that American companies are going to get completely curb stomped by China due to greed, ineptitude, and violating the social contract.
Deepseek V4 Flash is suprisingly capable and insanely cheap. It takes so much to get the session cost to get to $0.01.
I agree with you on pricing, but what do you mean by this?
Why aren't corporations doing more to help workers with childcare? Why aren't they doing more profit sharing with workers? Why aren't they encouraging unions or sectorial bargaining? Why isn't the government mandating any of this?
Americans very rarely benefit when US corporations do well. That needs to change. No one benefits if Meta continues making billions in profit every quarter while society suffers from isolation, depression, suicide, and scams from their services. Americans don't benefit if health insurance companies are making massive profits while they can't afford deductibles.
Our society has been setup to simply extract wealth in all facets of life. That's a sick society and it needs to change.
I'm not saying China does this better, in fact China has some of the worse worker rights out of all the industrialized countries; but at least American consumers would benefit from cheaper higher quality Chinese goods. The world would likely benefit too if America got off the cold war hype train that did nothing to benefit humanity outside of those making weapon systems.
The AI companies sure are a brilliant example of corporations needing to do more to help their employees pay for childcare.
I am on the $100 Max plan.
I had it analyze a project I was working on with Opus 4.8, and it blew through 23% of my session limit in one go. Does not portend well for my budget.
I do wonder if you switched models mid-session, you would have lost all your cache. Reloading the context into cache can really eat through your usage.
> Fable 5's safety measures flagged this message for cybersecurity or biology topics.
> They may flag safe, normal content as well.
> These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them.
Here are the results of the agentic code review session:
This 40 minute session cost me 16% of my weekly usage. A simple code review of the most critical areas of my project got flagged as a cybersecurity risk. It really made me not want to try it again.They also, FWIW, say that they've instituted new policies on their end such as logging any human access to the stored data and automated deletion after 30 days in "most" cases (with another link to a document detailing that further).
Opus 4.8 produces output in 15 minutes that is 3-4 hours of my work away from output that used to take me 40ish hours (a solid week of dedicated effort).
Last year(-ish, maybe it was 18 months, I forget when the jump happened), the frontier models couldn't touch this work. The output looked like a hardworking intern on their first day. Nice formatting, decent volume of words, but no understanding.
So it might work if it turns out to be a substantial leap in capability.
The newer models are smarter but really ficklle and hard to get meaningful work out of
4.6 was a workhorse
Assuming this isn't just a supply issue on their side, nothing says "ethical AI" like only allowing mega corporations to use it through cost barriers.
How many government sanctioned school bombings does it take for them to quit working with said government? For now we know that number is somewhere between infinity and 1.
The question of collaboration with USG is a much more complex one, but is not the one raised above.
Edit: I'll also add that I doubt any AI-doom people "trust" Anthropic per se. The entire angle of questioning – again – misunderstands the AI-doom argument. You appear to think that if companies behave unethically, they cannot be trusted and they will not produce good outcomes, inversely: if they behave ethically, they can be trusted, and they will produce good outcomes.
Any competent AI-doomer would argue that ethics or trust are essentially irrelevant.
The entire problem is that people can act totally reasonably, even ethically, and this is not a guarantee of good outcomes. Situations can be created in which completely ethical, reasonable behavior actually produces a bad outcome. You do not need to assume people are bad in order to produce a bad outcome, and inversely you cannot assume that you will get a good outcome from good people.
"Arms races" are one class of situations that often have this characteristic. "Bureaucracy" is another class that we encounter a lot in daily life. There's a lot of them!
Anthropic needs to be at least somewhat in the good graces of a capricious administration that is already under pressure from businesses and citizens to regulate AI companies across multiple different domains, whether it's energy consumption, job displacement, military and defense applications, surveillance, etc.
If Anthropic wants to survive, they need to acquire influence with the government that most impacts them as an American company, and a massive exporter of services in the AI space to other countries, otherwise they could get locked down and locked out of the market for national security reasons.
It sucks, but sometimes the survival choice is to make an ethical compromise in hopes that you can still be around to make better decisions later.
This "simple" fact needs quite a bit of additional context and work. Making grandiose ethical claims like this can be countered with other grandiose claims such as the fact that there is no ethical existence under communism or socialism.
The fact that there is no ethical consumption under capitalism is not material to whether or not ethical existence is possible under communism or socialism. In order to survive in a capitalist society, one inherently has to make choices that require trade-offs, and those trade-offs are burdened by a history of decisions made not just by the people alive today, but our ancestors as well. Does that mean I walk around chanting "Reparations", "Land-back", or other calls to action? No, but I do acknowledge that there are unresolved issues and as a Canadian, I know we need to do more to resolve treaty issues, and environmental issues, and system discrimination. I also know that Americans need to do better to address systemic discrimination and many, many other issues. It also doesn't mean I want to give back my house, or give away all of my possessions. It just means I try to make good choices and support businesses and people that are open about the trade-offs they make and try to engage as ethically as possible.
Acknowledging those facts doesn't absolve us of responsibility, it's a framework that allows folks concerned about whether or not they are doing the right thing to accept the trade-offs that they choose to make and be responsible and accountable for those choices to themselves or their communities.
We live in a world with scarce resources. It's possible that with a foundational redesign of the global economy, and the requisite authoritarian government that would be required to force such a redesign, we could eliminate food scarcity, solve energy scarcity, and make sure that everyone has a place to live. Those trade-offs are probably not worth the ethical cost in political and physical violence required to accomplish it. We have seen the trade-offs that happen when the powerful are able to exploit communist or socialist governments. We are seeing the "late stage capitalism" impacts of allowing the powerful to exploit capitalism in democratic societies. Acknowledging that the current capitalist system has lead to the greatest prosperity for the upper echelon (financially) of humanity, and a dramatic reduction in global poverty shouldn't obscure the reality that much of that wealth comes from exploitation of people and the environment.
It's a huge problem to unwind, and we can't let the burden of every choice that we make stop us from trying to do better, but we (as in society in general) can't do better if we don't at least acknowledge the compromises we are making along the way, and try to plan to fix it in the future.
Probably a topic better suited to beer and a pub setting than HN though :P
I don't believe that this is a fact. How are you demonstrating that this is a fact?
When you talk about things like reparations or "land back" you're already cargo-culting in concepts and ideas that themselves need to be fleshed out in order to make a subsequent claim that a specific economic system is unethical. Someone can just argue all economic systems are unethical, how are you going to defend against that? And can you pay reparations for example without going back in all of human history and finding all cases of injustices and then tallying it up? Why pick an arbitrary point in time? Better yet, why not start in countries where slavery still exists instead of focusing on the west which led the world in abolishing slavery and created concepts such as universal human rights.
Even with respect to "eliminating food scarcity" - eliminate in what sense? All olive groves and grapevines and rice farms have to be destroyed and rebuilt to only build certain foods?
Dabbling in communism or other inhumane and authoritarian governmental systems is extremely dangerous and in the same vein of extraordinary claims required extraordinary evidence, suggesting as you did creating an authoritarian government to create a utopia is precisely the same project of suffering and death that mass murderers throughout history have undertaken to abject failure, and thus, you need some incredible amount of evidence and theory to be able to even fairly suggest going down this path.
I am not going to do the work of gathering the evidence for you, and I don't think this is the right venue for a debate on the topic.
Talk about a strawman!
I think they might be hitting a point where subsidizing the expensive models for subscriptions makes less and less sense.
With Opus 4.X, last month I paid 100 USD for the Max subscription and got a token equivalent of 4.1k USD.
I imagine that Fable is more expensive to run.
(I’m highly confident open models will eventually achieve a similar performance benchmark with distillation over time)
AI Savings Misses 'Should Be Making Executives Uncomfortable,' Bain Says - https://news.ycombinator.com/item?id=48359010 - June 2026 (0 comments)
AI sticker shock hits corporate America- https://news.ycombinator.com/item?id=48307098 - May 2026 (146 comments)
ZIRP (zero interest rate policy) is over, software engineers no longer call the shots now that there isn’t vast amounts of capital chasing yield, and that capital bidding up salaries and keeping the labor market for engineers tight.
If you are x more productive with generative AI, very shortly you are going to have to prove it with a token budget (or, if you’re lucky, an org willing to spend for on prem hardware for capped token cost, fixed capex vs uncapped opex).
The comparison is not SWE vs SWE with AI. It is SWE vs SWE with AI with a constrained token budget ($x/month) delivering the same value at the same or lower cost. If you cannot prove that you are wildly (vs marginally) more productive with the AI, why would they pay for it? Prove it.
Why wouldn't Anthropic just wait until people start subscribing, do some kind of marketing push, or obtain some kind of other sustainable revenue stream, before they go IPO? I wonder if they see the writing on the wall with all of this and want to cash out as quickly as possible?
Specifically they need businesses that fired people and adapted their business to the products, so when the unsubsidized costs hit the businesses are forced to eat the true costs.
Yes they can't afford to give the products for free, but what is essentially happening with AI services is economic dumping, keep costs artificially low to get people to fire everybody, and then Jack the rates once they have Monopoly control
I agree. They need addicts, but they are high on their own supply and everyone else can see the danger in getting hooked.
I just use dumb and fast models now. I'm more engaged. I think that the higher the quality of the model, the more you tend to vibe with it, and then the more hallucinations you then miss. I'm not sure which is more productive, but I definitely burn out faster the more I vibe. At some point you're spending your time on forums, discord, or youtube instead of engaged with what you're building. Or you yak shave about your tooling and end up creating the 600th multi-agent gastown harness and blowing thousands of dollars on tokens to create it only to discover it's too expense to actually use.
https://cursor.com/evals
Upd: I meant big picture, not with respect to this model release. Where do subscriptions figure into their strategic vision. Will consumers end up paying enterprise prices in the future?
why do they have capacity now that they wont in a few weeks?
They'll probably tighten the quotas to reign in whales though.
Realistically I think Anthropic just has insane demand but finite capacity to run models, and Fable will just make them more money if they dedicate it to API pricing. I suspect the goal here is something like: get individual engineers/PMs on their personal plans to taste Fable and then go to their meetings and say "Yes doubling the price of every single input/output token is a good idea, boss".
The only reason why I pay $200 is because LLM's errors costs me that much, at worst. If "make no error" starts working - sure. But surely, unless you have millions of dollars of cash to burn, a coin flip that costs $5000 is an insane idea?
Going PAYG only will effectively take these tools away from a huge amount of people and accelerate the push for local LLMs.
OTOH, accelerating the push for local LLMs would also be fine with me.
The AI landscape is changing rapidly, and with Apple announcing the option to change the AI backend, and potential requirements enable AI choices as well, similar to EU browser choice requirements (this is more reading tea leaves than any actual requirements I am aware of). The new OS changes coming to support Googlebook, and deep Copilot/AI integration into Windows will make maintaining user facing subscriptions essential for independent model developers like OpenAI, Anthropic, and Mistal to remain relevant longer term.
If the don't maintain that relevance there is increasing likelihood that they will get consumed by other companies whether it's Apple, Microsoft or Google to form a foundation for their OS, or other cloud providers.
It's kind of annoying not getting access to the primo model and paying 200 bucks a month. I understand 200 bucks a month is basically nothing though.
Like I don't totally understand why they'd let me have it for a couple weeks and then take it away and say I can have it but I have to pay retail and retail is like $1,000 a day.
It's better to have loved and lost than to have never loved at all??
As a consumer I can choose to buy subscriptions to a range of things, including $5 droplets or VMs on a broad range of cloud hosting providers. I can even buy cheap bare metal at a bunch of providers at an affordable retail rate.
I can also buy "unlimited" AI packages that will be optimized to fit the cost model from a variety of services, with different impacts, such as rolling outages when I consume a daily or hourly allotment.
Right now VC and the investor class are subsidizing the rapid evolution of the services and availability, but that VC is running out. In more traditional economies, AI would have developed and rolled out more slowly, and through metered subscriptions, with the eventual rolling out of "unlimited" packages like telephone, internet, or cell services once the market became commoditized.
We have seen a big inversion of that with the race to "win" AI marketshare. Now the true cost is being exposed, and the most competitive and capable models are hideously expensive to operate, so it makes sense that we are moving to metered billing for a utility service. If you want gas, you can buy regular or premium. If you have a premium car you definitely want the premium, but for most people regular is good.
Give it a couple of years, and the survivors will settle around fairly industry standard models of consumer grade services, pro-sumer accounts, and business/enterprise models.
Things are still shaking out, but I get the sadness. Luckily I work at a big tech company who is banging the drum on doing experimentation so I use my prosumer claude pro and other accounts at home for hobby stuff, and save my heavy lifting and potentially experimentation for work :P
As annoyed as I am about this move, I get it. Users flood the newest, best model whether they really need it or not, and are efficient at using their entire quota. They've had so much trouble reigning in subscription usage it makes sense.
It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1
It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.
Probably all about the IPO.
...
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."
[edit] -- I see that this comes from the system card -- dang merged the comments from the other discussion so that explains the confusion.
The step-up in intelligence looks massive (we'll see in practice), but the price is getting to a point where it's making me question if it's even worth giving it a try.
Good competitors will probably be out soon, which should level the playing field. I am more excited about that, just the fact that they showed that such an improvement is possible. I'm okay waiting a bit longer for this to become attainable for plebs like me.
Kind of like billing a programmer by the hour.
Perhaps not that close to US salaries, but those are inflated to hell. Worldwide senior engineers and scientists have salaries just about an order of magnitude away from AI subscriptions that you can use most of the day every day.
Do we know this? I’ve seen evidence they lose money on heavy users. But so do gyms.
Most gyms sell more subscriptions than they can fit under their roof at one time. If a gym only sells to heavy users, it will either be constantly turning members away or have to buy more equipment. Its equipment will wear off faster. Depending on amenities, it will go through towels, soap, water, et cetera faster, too.
Unless they're really, seriously wasteful with the soap.. there's no chance a gym is losing money on a heavy user
Right now all these AI subscriptions are priced like Planet Fitness, but they're used like Equinox. They're hoping that the new a la carte offerings will move their pricing more in that direction as well.
Where?
What I wonder however is if these tools will become something I use at work only. $100/month is already a massive stretch budget wise. If these models keep devouring tokens there’s no way I’d get the same usage time out of them for $100 in usage credits.
I just don’t think I’d use them much at all at home.
If you rely on this as a core part of your business/profession, you will be at their mercy and subject to whatever whims or challenges they have.
> Fable 5 · Most capable for your hardest and longest-running tasks · Uses your limits ~2× faster than Opus
Pay-as-you go isn't a common thing in SaaS. For example, except for AWS SES, all email providers are bulk-subscription based.
Sounds like "bait and wait".
If you think about it, the more people pay for these new and more resource hungry models, the longer it takes for them to become no extra cost and the longer it takes the more people are tempted to pay extra.
Of course, they are a casino as well giving you free spins at the wheel with their new Fable machine, and it is done on purpose.
Once there freebies have expired, many of its users will begin to gamble more on the new casino machine and will realize that it is expensive.
The ramifications go beyond the individual which is why I assume they mentioned it. They don’t need to use it/not use it for it to have interesting implications.
Is it nice we get the trial? Sure. Is it also a common play in the playbook of tech companies? Yes.
Anthropic does not care about us and isn't going to talk to you either and will extract from you as much as possible.
The true answer is local models.