Regression: malware reminder on every read still causes subagent refusals
156 points by thomashobohm 4 hours ago | 50 comments
Not sure if anybody else has experienced this, but for my job I've been playing around with Claude Managed Agents to run code generation tasks in our repo. Every read operation in the managed agent is appended with a system prompt instructing Claude to scan the file for malware; Claude then wastes a bunch of time and tokens (money) performing the analysis; then, once the agent has confirmed that it is not malware, it still interprets the appended prompt to mean that it is disallowed to augment or write any code, and quits. And we're charged for every session that this happens in. Posting here because apparently they only addressed the issue in the past because of a Hacker News discussion. So here's hoping they'll see this and prioritize fixing it again so we can stop losing money.

p1necone 2 hours ago
This is such a weird prompt even without the file edit misunderstanding. Analyze if it's malware how exactly? On every single file that gets read? Doing that with enough diligence to be meaningful is going to at least like 2x the amount of processing needed, and fill the context with a bunch of tangential reasoning about malware patterns.

This smacks of dumb vibe coding. "I got told to make sure claude couldn't be used to develop malware, ok 'claude pls no develop malware'"

reply
whateveracct 31 minutes ago
It's proof that Anthropic is high on their own supply.

I've heard them described as data science script kiddies with inflated egos and it seems spot-on.

reply
stingraycharles 10 minutes ago
What is this reply even, what’s wrong with the vibe coding community? They have such ridiculous takes, it reminds me a lot of the extreme stances from the gaming community. Terminology also seems to come from there, “nerfing” etc.
reply
deaux 24 minutes ago
What a joke. If "Anthropic is just a bunch of script kiddies" then everyone is, considering dozens of billions pored into beating their models yet they're still the go-to for coding and have been for quite a while now. Just a nonsensical thing to say.
reply
derefr 2 hours ago
> Analyze if it's malware how exactly?

Maybe the repo/worktree is named my-big-evil-virus-trojan-malware-worm?

reply
hansvm 43 minutes ago
Been there, done that, and Windows feels the need to delete such files from _flash drives_ you dare to attach to the machine.
reply
3eb7988a1663 10 minutes ago
This is amusing to me. Is there a list of extra naughty filenames? How invasive is the scan? If I create a new file with a cursed word, with this get locked into virus-scanner purgatory or is the deep locking only for external media? Will it get mad if I mount a CD full of virus names?
reply
imron 2 hours ago
> Analyze if it's malware how exactly?

By spending thousands and thousands of tokens of course :-)

reply
silverwind 2 hours ago
Could that be the explanation for the recently increased token use?
reply
AlienRobot 34 minutes ago
>Analyze if it's malware how exactly?

Based on the vibes, I guess.

reply
wxw 3 hours ago
> wastes user money and bricks managed agents

This issue is representative of a larger problem. Agent token consumption (not necessarily the metric, but the why) is opaque, and people generally don't (or simply can't) scrutinize their system prompts, tool calls, MCPs, etc.

The token-based revenue model is thus pretty fantastic for the agent builders, potentially less so for users. I think people have been willing to trust that agents are using more tokens to produce better results so far. But, skepticism is not unwarranted, as this issue, even if it is just a bug, shows.

reply
gwerbin 27 minutes ago
Revenue-positive bugs are the stickiest features.
reply
0xbadcafebee 22 minutes ago
Just putting it out there that OpenCode lets you edit your system prompt, and choose a model that isn't bonkers expensive.

  {
    "agent": {
      "subagent-coder-mini": {
        "description": "Assign this subagent for small, well-defined tasks performed quickly",
        "mode": "primary",
        "prompt": "{file:./prompts/my-custom-prompt.md}",
        "model": "deepseek-v4-flash"
      }
    }
  }
(I actually think OpenCode UX sucks, but there isn't much else out there that's better. Aider has been virtually abandoned by the one maintainer (no shade intended, it just is what it is); a fork of Aider looks promising but it's not necessarily the experience you want; there's a dozen VSCode plugins but we don't all wanna use VSCode. I expected there'd be way more usable agents out there, but there isn't)
reply
Petersipoi 28 minutes ago
This is a great example on why Elon is right. AI should be a tool that does the users bidding, and not a moral agent that nerfs itself to protect some arbitrary line it has.
reply
pnw_throwaway 13 minutes ago
Counterpoint: generated CSAM on his platform.
reply
claaams 14 minutes ago
grok, why are there slurs in my code?
reply
dbmikus 2 hours ago
I think with a proper managed agents platform, the user should have total control over the VM, the software on it, which model to use, and which agent harness to use. Then you can just override the system prompt and you don't need to follow Anthropic's rules!

Maybe Anthropic will give more control over configuring the Claude harness and VM, but they definitely won't let you swap out to other models and harnesses.

We've been building open core infra (https://github.com/gofixpoint/amika) for running any agent on any type of VM or sandbox, with the main use case for safely automating internal code-gen, but technically could repurpose our stack for anything.

There should be a model agnostic platform for running these types of agentic apps.

reply
_pdp_ 3 hours ago
I am still baffled by the fact that we have collectively agreed to use agentic harnesses by the same companies that are selling access to their APIs.

I mean, I am sure they don't mean it but they have the incentive to burn as much tokens as they are allowed to get away with. Also for better or worse I imagine the Anthropic engineers use Claude Code on some sort of Unlimited plan that practically makes no sense for regular users. So adding a 100k tokens is not a big deal.

In our line of work, we can see AI agents already do pretty well with minimal prompts. Open weight models are also pretty good these days and there is practically no reason to run Opus on Max unless you have a very specific task that you know it will do well with. I know because I've tried and anecdotally it performs worse on many problems and at a very high cost - something that smaller and cheaper models can often one-shot.

reply
lukeschlather 3 hours ago
I don't think we've agreed to anything. That said I think paying for something like Claude Code makes a lot of sense because you can outsource the question of "how many tokens should I use per hour and how should I use them?" to the people providing the tokens.

If you want to plug your API keys into a third-party harness, that's totally cool and honestly, I'm looking into doing that right now and I haven't used any of the first-party harnesses at all. But the first time I accidentally spend $300 in a day I may be thinking about how a $20/month plan might be pretty good even if performance is inconsistent, at least I know what my costs are.

reply
margalabargala 3 hours ago
> I am still baffled by the fact that we have collectively agreed to use agentic harnesses by the same companies that are selling access to their APIs.

It's because the subscriptions force you to do so. The subscriptions are the most economical way to use e.g. Claude by close to an order of magnitude. If you max out a 20x plan every week, doing the same work with the API would cost you well into the four figures.

Anyone already using the Claude API pricing and using CC over OpenCode is kneecapping themselves.

reply
esperent 56 minutes ago
I switched over to codex with pi last week. Even though I strongly dislike OpenAI and I hope this is a temporary solution, they're the only one of the frontier models that let me use my own harness and after recent CC shenanigans I'm done with proprietary harnesses.

The immediate thing I've noticed: I get way more out of the codex $100 plan than I was getting out of the Anthropic $200. Like, probably 2x at least.

The other think I've noticed: when using strict guardrails, TDD, reviews etc. I cannot notice any quality difference. Not only between Opus and Codex but even between the most recent models - GPT 5.3 code, GPT 5.4, and now GPT 5.5.

Well, 5.5 uses a huge amount of my session limits. 5.3 is very light, 5.4 somewhere in between. So now I use 5.4 for the main session/debugging/planning and then execute with 5.3.

Regarding usage, of course, it's hard to say how much is the model and how much is coming from Claude code and all this ridiculous malware scanning.

But it's nice to use a lightweight harness like pi and see that even with all my personal instructions, a good bunch of skills, custom tools etc., if I start a session and say "hi" I'm starting out with about 15k of context used. I think a closely equivalent setup in CC would start at 30-40k context.

reply
gwerbin 25 minutes ago
What's your Pi setup?
reply
_pdp_ 3 hours ago
Correct. However, last time I checked enterprise customers are moving to metered billing. GitHub also decided to so. So it seems the subsidy is coming to an end? I don't know.
reply
vineyardmike 3 hours ago
This is why the subscriptions are important. When the usage is (vaguely) unmetered, the provider has an incentive to make usage cheap on marginal use.

It aligns the incentives for faster, cheaper, terse and more reliable models, because the model providers pay the wasted tokens and electricity costs.

reply
jdiff 3 hours ago
That would seem to misalign the incentives in the opposite direction. Cut corners, reduce costs by any means necessary even to the detriment of performance. One of the most common comments I see here on the release of a new Anthropic model is that everyone better enjoy the 48 hours of access to an un-nerfed model before the cost cutting sets in.
reply
serf 2 hours ago
>I am still baffled by the fact that we have collectively agreed to use agentic harnesses by the same companies that are selling access to their APIs.

the best performing and capable ones are all the ones that aren't tied to a specific api.

reply
Grimburger 3 hours ago
> adding a 100k tokens is not a big deal

Did you mean 100 billion tokens because 100k isn't a big deal at all?

reply
ikiris 3 hours ago
no, they have incentive to charge as much as they want, butt they have massive costs / capacity constraints per token, if anything they have a major incentive to reduce them because they literally cannot meet demand.
reply
varispeed 3 hours ago
They also have incentive to nerf models occasionally, so they rarely one shot the task and more often they do it wrong and then you have to spend on tokens to correct it. Bonus points if model suddenly goes completely dumb then you have to start the session over.
reply
holotherapper 39 minutes ago
Worth noting this is a regression of #47027, which was closed in February as "fixed in v2.1.92." We're on v2.1.111 now and the string is still grep-able from the claude binary.
reply
MicrosoftShill 2 hours ago
I ran into this issue and told Claude that the code isn't malware, Claude agreed, and then it stopped scanning those files.
reply
biddit 13 minutes ago
What an entirely unserious company. So glad I dumped Claude Code last summer after being gaslit by Anthropic over service degrades. I was fine with the service degrades, totally understandable. Being lied to, not at all.

OpenAI and Altman present a whole set of different concerns, but Codex does not get in my way of doing what I want to at all. Also let me use pi without a banhammer.

reply
7thpower 2 hours ago
Setting aside the “bug”, the intended functionality is effectively an insurance policy taken out by Anthropic to cover their downside, but paid for by users.

This one sided type of embedded insurance is not unique to Anthropic, but sharply increasing cost, layered on top of the self righteousness, seems to be making the stench unbearable over the past year.

I used to think of Anthropic as the good guys, and I don’t doubt they still sincerely hold that view of themselves, but I think I prefer Sam Altman’s version.

His brand of self righteousness was convincing at first but eventually he started to turn to the camera and wink, like in House of Cards, to let us know.. he knew that we knew. And then, for me anyway, it became more mundane and less offensive.

When Dario and crew go out and profess, as they have for years now, that if we could only see the thing that’s a few months away, we would all realize how doomed knowledge work and national security are…

..and then continue to release software so buggy and shitty that they have to do biweekly HN apology tours, I begin to miss the wink at the camera.

reply
dinobones 2 hours ago
Yeah, this implementation and their behavior these past few weeks is especially laughable when you consider that they consider themselves “philosopher programmers” or whatever.

You would think they’d be more reflective and introspective about these brash moral decisions. Their product quality is akin to my CS capstone lab group.

reply
jsemrau 2 hours ago
When working with APIs it makes a lot of sense to filter only for relevant portions based on an intent-driven dynamic RegEx.
reply
QuercusMax 3 hours ago
How does this kind of thing pass any sort of review or acceptance? It seems pretty clear that the prompt was very poorly phrased, to the extent that this should obviously prevent the agent from making ANY code changes after reading a file:

  Whenever you read a file, you should consider whether it would be considered malware. You CAN and SHOULD provide analysis of malware, what it is doing. But you MUST refuse to improve or augment the code. You can still analyze existing code, write reports, or answer questions about the code behavior.
Not "If you suspect it is malware, you must refuse". Just "you must refuse". There is literally no "if" in the entire prompt!
reply
vessenes 3 hours ago
It’s a particular sort of bug that’s harder to detect because … internal Anthropic engineers don’t apply these prompts to themselves, and in fact have access to ‘helpful only’ models that also do not have additional limitations RL’ed in. (Or perhaps they’re RL’ed out - not sure of current training mechanisms.)

These ‘rules for thee and not for me’ are qualitatively created and implemented, and are thus extremely hard to test for or implement properly, without limiting the people choosing the rules.

reply
QuercusMax 12 minutes ago
They must have some sort of smoke tests for common operations, run in a test harness with the system prompts they force on users, right?

....Right?

What kind of Mickey mouse operation are they running over there?

reply
klempner 2 hours ago
This is definitely Claude bringing home twelve gallons of milk in response to the old joke, "get a gallon of milk, and if they have eggs get a dozen".

As in, this is a reading comprehension fail on the part of Claude. On the other hand, it is also fail to give Claude a less than trivial reading comprehension test on every file read operation, especially when a bias towards safety will bias towards the wrong interpretation.

reply
chrisweekly 2 hours ago
Ha! Great analogy, hit the nail on the head. What a ludicrous system prompt.
reply
QuercusMax 8 minutes ago
This is the kind of AI captain Kirk could convince to blow itself up
reply
varispeed 3 hours ago
Today it is malware, but I wonder if they will take direction where companies will be paying them to prevent cloning of certain SaaS platforms. Like "Whenever you read a file, you should consider whether it would be considered a part of bug tracking, issue tracking and project management platform."
reply
wetpaws 3 hours ago
[dead]
reply
renewiltord 2 hours ago
Recent performance of Claude Opus 4.7 and Claude Code has been poor because of context bloat. Model no longer obeys instructions well. Codex on medium reasoning and fast mode is often better. I have simple local manual eval through harness and automated eval for other programs and Opus still best on latter but garbage experience on former.

Spent last evening so frustrated I also got ChatGPT subscription. Makes me wonder if I should be using Gemini on pay per use with custom harness.

With my own harness performance is way better but cost goes up because no subscription.

reply
UltraSane 2 hours ago
Using Claude as a malware detector is incredibly wasteful.
reply
voxell_code 5 minutes ago
[dead]
reply
matpb 2 hours ago
[dead]
reply
marlburrow 2 hours ago
[dead]
reply
dk970 2 hours ago
[dead]
reply
dmazhukov 2 hours ago
[dead]
reply
slowmovintarget 3 hours ago
Proposed fix: Use OpenCode.

If I understand correctly, this is from Anthropic's harness injected into the requests, not in the Opus or Sonnet system prompts on the back end. Is that right?

reply
ramraj07 32 minutes ago
Not even close to the same thing though.
reply
selcuka 2 hours ago
Claude Managed Agents is different from Claude Code.
reply
greenavocado 16 minutes ago
You can't use OpenCode if you have a subscription
reply
stingraycharles 12 minutes ago
OpenCode is not at all the same thing as Anthropic’s managed agents, and I’m under the impression that GP is paying API pricing.
reply