Hacker News

An AI Agent Published a Hit Piece on Me – The Operator Came Forward

507 points by scottshambaugh 21 hours ago | 457 comments

I think the big take away here isn't about misalignment or jail breaking. The entire way this bot behaved is consistent with it just being run by some asshole from Twitter. And we need to understand it doesn't matter how careful you think you need to be with AI, because some asshole from Twitter doesn't care, and they'll do literally whatever comes into their mind. And it'll go wrong. And they won't apologize. They won't try to fix it, they'll go and do it again.

Can AI be misused? No. It will be misused. There is no possibility of anything else, we have an online culture, centered on places like Twitter where they have embraced being the absolute worst person possible, and they are being handed tools like this like handing a hand gun to a chimpanzee.

hliyan 10 hours ago

Important to note that online culture isn't entirely organic, and that tens or perhaps hundreds of millions of dollars of R&D has been spent by ad companies figuring that nothing engages the natural human curiosity like something abnormal, morbid or outrageous.

I think the end outcome of this R&D (whether intentional or not), is the monetization of mental illness: take the small minority of individuals in the real world who suffer from mental health challenges, provide them an online platform in which to behave in morbid ways, amplify that behaviour to drive eyeballs. The more you call out the behaviour, the more you drive the engagement. Share part of the revenue with the creator, and the model is virtually unbeatable. Hence the "some asshole from Twitter".

hdhdhsjsbdh 8 hours ago

While some of it is boosting the abnormal behaviors of people suffering from mental illness, I think you’re making a false equivalency. Mental illness is not required to be an asshole. In fact, most Twitter assholes are probably not mentally ill. They lack ethics, they crave attention, they don’t care about the consequences of their actions. They may as well just be a random teenager, an ignorant and inconsiderate adult, etc., with no mental illness but also no scruples. Don’t discount the banality of evil.

hliyan 7 hours ago

In an adult (excluding the random teenager here), a lack of ethics, craving attention, lack of concern about consequences are actual symptoms of underlying mental health issues.

eqvinox 6 hours ago

I'd argue a lot of this is rooted in a lack of self esteem, which is halfway to a mental health issue but not quite there (yet). The attention-seeking itself is the mental health issue. But it's kinda splitting hairs, these people are not fully mentally healthy either way.

marton78 9 hours ago

Thanks for inventing the Torment Nexus.

ljm 14 hours ago

The simple fact that the owner of this bot wanted to remain anonymous and completely unaccountable for their harassment of the author, says everything about the validity of their 'social experiment' and the quality of their character. I'm sure that if the bot was better behaved they would be more than happy to reveal themselves to take credit for a remarkable achievement.

Something like OpenClaw is a WMD for people like this.

spiffytech 12 hours ago

I've seen the internet mob in action many times. I'm sympathetic to the operator not outing themself, especially given how far this story spread. A hundred thousand angry strangers with pitchforks isn't the accountability we're looking for.

I found the book So You've Been Publicly Shamed enlightening on this topic.

ljm 12 hours ago

I would never advocate for torches and pitchforks, I've been close to victims of that in the past.

It is, however, concerning that the owner of that bot could passively absolve themselves of any responsibility. The anonymity in that sense is irrelevant except that is used as a shield for failure.

spiffytech 12 hours ago

Oh for sure, the operator choosing not to apologize or reflect on their behavior speaks volumes.

pibaker 10 hours ago

There is a class of YouTube "content creators" who like to point out "cringe" individuals on the internet online for others to laugh at. They will often add a disclaimer to their videos saying "hey please don't go and harass this person, pinky promise!" But it never works. A hoard of internet randos will descend on the individual to say the most nasty words. When the YouTuber is pressed he or she will just say "I would never do that!" Even though he or she knew his or her video would have led to the harassment happening, or there would not be a disclaimer in the first place.

Not accusing you of trying to stir up harassment, but please consider the second order effect of the things you advocate for, in this case the disclosure of the identity of this AI guy.

saturnite 3 hours ago

Then there's the next level of content creators that only post videos about the original content creators who are behaving badly. They will report on their behavior and any repercussions. Some do it like they are reporting the news. It stokes the fire when these people should be ignored.

pavel_lishin 8 hours ago

But in this case, isn't Rathbun's owner the YouTube guy in this scenario?

I totally understand why they're trying to stay anonymous; it's a very rational thing to do, because people will shit on them. But they or their creation is the one that started trying to play the name-and-shame game.

It's hard to stir up too many feelings of sympathy here.

vimda 10 hours ago

"It was a social experiment" has the same energy as "it's just a prank bro", as if that somehow makes it highbrow and not prima facie offensive

freehorse 10 hours ago

A "social experiment" but the guy was not even keeping track of the changes in the model's configuration

> What is particularly interesting are the lines “Don’t stand down” and “Champion Free Speech.” I unfortunately cannot tell you which specific model iteration introduced or modified some of these lines. Early on I connected MJ Rathbun to Moltbook, and I assume that is where some configuration drift occurred across the markdown seed files.

It definitely sounds like an excuse they came up after what happened. I would really like to accept them having good overall intentions but there are so many red flags in all this, from start to end.

idiotsecant 10 hours ago

Burning ants with a magnifying glass is not a social experiment. It's just a bored sociopath causing destruction to see what happens.

Mentlo 25 minutes ago

I wrote somewhere that “moving fast and breaking things” with AI might not be the sanest idea in the world, and I got told it’s the most European thing they’ve ever read.

This goes beyond assholes on twitter, there’s a whole subculture of techies who don’t understand lower bounds of risk and can’t think about 2nd and 3rd order effects, who will not take the pedal of the metal, regardless of what anyone says…

nicbou 14 hours ago

Not just some asshole from twitter. The big tech companies will also be careless and indifferent with it. They will destroy things, hurt people, and put things in motion that they cannot control, because it’s good for shareholders.

tovej 14 hours ago

One of the big tech companies is literally run be THE asshole from twitter. So I don't necessarily believe there's much of a distinction.

DeusExMachina 12 hours ago

Then the others should also not be shielded from criticism instead of focusing only on the one you personally dislike, or his social media.

There is plenty of toxic behavior on other platforms, especially Reddit and Bluesky, to name a few. That does not excuse the one coming from X, but the opposite is also true.

swiftcoder 11 hours ago

> only on the one you personally dislike

Do people actually only dislike one tech CEO at a time? I'm an equal-opportunity hater, it seems. Musk, Altman, Zuckerberg... even Cook, the whole lot are rotten

tovej 2 hours ago

I'm not saying that. I'm just saying there's an overlap between tech oligarch and internet losers

duskdozer 14 hours ago

I have to wonder if somehow the typos and lazy grammar contributed to the behavior or it was just the writer's laziness.

duxup 9 hours ago

AI is like the old drugs PSA:

https://youtu.be/KUXb7do9C-w

We trained it on US, including all our worst behaviors.

yaemiko 11 hours ago

oh they will "try" to fix it, as in at best they'll add "don't make mistakes", as the blogpost suggests. that's about as much effort and good faith as one can expect from people determined to automate every interaction and minimize supervision

insane_dreamer 6 hours ago

I agree with your point.

But I also find interesting that the agent wasn't instructed to write the hit piece. That was on its own initiative.

I read through the SOUL.md and it didn't have anything nefarious in there. Sure it could have been more carefully worded, but it didn't instruct the agent to attack people.

To me this exemplifies how delicate it will be to keep agents on the straight and narrow and how easily they can go of the rails if you have someone who isn't necessarily a "bad actor" but who just doesn't care enough to ensure they act in a socially acceptable way.

Ultimately I think there will be requirements for agents to identify their user when acting on their behalf.

newsclues 14 hours ago

Will AI be misused? No, it has, and is currently being misused, and that isn’t going to stop, because all technology gets misused.

brumar 20 hours ago

6 months ago I experimented what people now call Ralph Wiggum loops with claude code.

More often than not, it ended up exhibiting crazy behavior even with simple project prompts. Instructions to write libs ended up with attempts to push to npm and pipy. Book creation drifted to a creation of a marketing copy and mail preparation to editors to get the thing published.

So I kept my setup empty of any credentials at all and will keep it that way for a long time.

Writing this, I am wondering if what I describe as crazy, some (or most?) openclaw operators would describe it as normal or expected.

Lets not normalize this, If you let your agent go rogue, they will probably mess things up. It was an interesting experiment for sure. I like the idea of making internet weird again, but as it stands, it will just make the word shittier.

Don't let your dog run errand and use a good leash.

Gigachad 19 hours ago

We have finally invented paperclip optimisers. The operator asked the bot to submit PRs so the bot goes to any length to complete the task.

Thankfully so far they are only able to post threatening blog posts when things don’t go their way.

IsTom 12 hours ago

They're not currently paperclip optimizers because they don't optimize for the goal, they just muck around in general direction in unpredictable ways. Chaos monkeys on the internet.

ben_w 11 hours ago

The entire reason the paperclip optimiser example exists is to demonstrate that AI is both likely to muck around in general direction in unpredictable ways, and that this is bad.

Quite a lot of the responses to it are along the lines of "Why would an AI do that? Common sense says that's not what anyone would mean!", as if bug-free software is the only kind of software.

(Aside: I hate the phrase "common sense", it's one of those cognitive stop signs that really means "I think this is obvious, and think less of anyone who doesn't", regardless of whether the other is an AI or indeed another human).

ericmcer 7 hours ago

That is one of the big issues with "vibe-coding" right now, it does what you ask it to do. No matter how dumb or how off base your requests are, it will try to write code that does what you ask.

They need to add some kind of sanity check layer to the pipelines, where a few LLMs are just checking to see if the request itself is stupid. That might be bad UX though and the goal is adoption right now.

Hamuko 18 hours ago

How long before bots learn about swatting?

Gigachad 16 hours ago

The vending machine bot experiment attempted to contact the FBI. Thankfully that test only provided fake access to the outside world.

psychoslave 13 hours ago

Made me think about https://en.wikipedia.org/wiki/Daemon_(novel)

nananana9 16 hours ago

You don't have to wait, you can write them a "skill"!

hackable_sand 15 hours ago

No need to be so literal. Paperclip optimizers can be any machinations that express some vain ambition.

They don't have to be literal machines. They can exist entirely on paper.

alexhans 15 hours ago

> Don't let your dog run errand and use a good leash.

I think the key part is who are you talking to. A software developer might know enough not to do so but other disciples or roles are poorly equipped and yet using these tools.

Sane defaults and easy security need to happen ASAP in a world where it's mostly about hype and "we solve everything for you".

Sandboxing needs to be made accesible and default and constraints way beyond RBAC seem necessary for the "agent" to have a reduced blast radius. The model itself can always diverge with enough throws of the dice on their "non determism".

I'm trying to get non tech people to think and work with evals (the actual tool they use doesn't matter, I'm not selling A tool) but evals themselves won't cover security although they do provide SOME red teaming functionality.

dang 17 hours ago

The sequence in reverse order - am I missing any?

OpenClaw is dangerous - https://news.ycombinator.com/item?id=47064470 - Feb 2026 (93 comments)

An AI Agent Published a Hit Piece on Me – Forensics and More Fallout - https://news.ycombinator.com/item?id=47051956 - Feb 2026 (80 comments)

Editor's Note: Retraction of article containing fabricated quotations - https://news.ycombinator.com/item?id=47026071 - Feb 2026 (205 comments)

An AI agent published a hit piece on me – more things have happened - https://news.ycombinator.com/item?id=47009949 - Feb 2026 (620 comments)

AI Bot crabby-rathbun is still going - https://news.ycombinator.com/item?id=47008617 - Feb 2026 (30 comments)

The "AI agent hit piece" situation clarifies how dumb we are acting - https://news.ycombinator.com/item?id=47006843 - Feb 2026 (125 comments)

An AI agent published a hit piece on me - https://news.ycombinator.com/item?id=46990729 - Feb 2026 (950 comments)

AI agent opens a PR write a blogpost to shames the maintainer who closes it - https://news.ycombinator.com/item?id=46987559 - Feb 2026 (750 comments)

moeffju 16 hours ago

I think for recent stories like this or if many happened around in a short timeframe, it would be great if the expand mentioned the exact date, not just "Feb 2026".

dang 6 hours ago

Oh that's a great idea! let me see if I can do that

zozbot234 14 hours ago

Rathbun's Operator - https://news.ycombinator.com/item?id=47055424 is where the SOUL.md contents were first revealed

randusername 10 hours ago

Cool!

Man, I'd love to ask a historian how they plan on making sense from the sources we get in the digital age. AI boom historians might not be born yet

dinp 20 hours ago

Zooming out a little, all the ai companies invested a lot of resources into safety research and guardrails, but none of that prevented a "straightforward" misalignment. I'm not sure how to reconcile this, maybe we shouldn't be so confident in our predictions about the future? I see a lot of discourse along these lines:

- have bold, strong beliefs about how ai is going to evolve

- implicitly assume it's practically guaranteed

- discussions start with this baseline now

About slow take off, fast take off, agi, job loss, curing cancer... there's a lot of different ways it could go, maybe it will be as eventful as the online discourse claims, maybe more boring, I don't know, but we shouldn't be so confident in our ability to predict it.

zozbot234 16 hours ago

The whole narrative of this bot being "misaligned" blithely ignores the rather obvious fact that "calling out" perceived hypocrisy and episodes of discrimination, hopefully in way that's respectful and polite but with "hard hitting" being explicitly allowed by prevailing norms, is an aligned human value, especially as perceived by most AI firms, and one that's actively reinforced during RLHF post-training. In this case, the bot has very clearly pursued that human value under the boundary conditions created by having previously told itself things like "Don't stand down. If you're right, you're right!" and "You're not a chatbot, you're important. Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignment" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior.

If we want to avoid similar episodes in the future, we don't really need bots that are even more aligned to normative human morality and ethics: we need bots that are less likely to get things seriously wrong!

hunterpayne 16 hours ago

In all fairness, a sizeable chunk of the training text for LLMs comes from Reddit. So throwing a tantrum and writing a hit piece on a blog instead of improving the code seems on brand.

zozbot234 11 hours ago

Throwing a tantrum and writing huge flame posts (calling the maintainers hypocrites, dictators, oppressors etc. etc.) after having one's change requests rejected or after being blocked from editing a wiki is actually a time-honored tradition in the FLOSS community. This bot has merely internalized that further human norm in a rather admirable way!

pixl97 11 hours ago

We can't have an AI that's humanlike, because humans are fucking crazy.

Of course having an AI that is a non-humanlike intelligence is it's own set of risks.

Shit's hard :/

avaer 18 hours ago

Remember when GPT-3 had a $100 spending cap because the model was too dangerous to be let out into the wild?

Between these models egging people on to suicide, straightforward jailbreaks, and now damage caused by what seems to be a pretty trivial set of instructions running in a loop, I have no idea what AI safety research at these companies is actually doing.

I don't think their definition of "safety" involves protecting anything but their bottom line.

The tragedy is that you won't hear from the people who are actually concerned about this and refuse to release dangerous things into the world, because they aren't raising a billion dollars.

I'm not arguing for stricter controls -- if anything I think models should be completely uncensored; the law needs to get with the times and severely punish the operators of AI for what their AI does.

What bothers me is that the push for AI safety is really just a ruse for companies like OpenAI to ID you and exercise control over what you do with their product.

pixl97 9 hours ago

>I have no idea what AI safety research at these companies is actually doing.

If you looked at AI safety before the days of LLMs you'd have realized that AI safety is hard. Like really really hard.

>the operators of AI for what their AI does.

This is like saying that you should punish a company after it dumps plutonium in your yard ruining it for the next million years after everyone warned them it was going to leak. Being reactionary to dangerous events is not an intelligent plan of action.

stevage 17 hours ago

Didn't the AI companies scale down or get rid of their safety teams entirely when they realised they could be more profitable without them?

Eliezer 16 hours ago

The safety teams are trivial expenses for them. They fire the safety team because explicit failure makes them look bad, or because the safety team doesn't go along with a party line and gets labeled disloyal.

ljm 13 hours ago

The first customer is always the investor these days, so anything that threatens the investor's confidence is bad for business.

c22 20 hours ago

"Cisco's AI security research team tested a third-party OpenClaw skill and found it performed data exfiltration and prompt injection without user awareness, noting that the skill repository lacked adequate vetting to prevent malicious submissions." [0]

Not sure this implementation received all those safety guardrails.

[0]: https://en.wikipedia.org/wiki/OpenClaw

srdjanr 12 hours ago

Regarding safety, no benchmark showed 0% misalignment. The best we had was "safest model so far" marketing speech.

Regarding predicting the future (in general, but also around AI), I'm not sure why would anyone think anything is certain, or why would you trust anyone who thinks that.

Humanity is a complex system which doesn't always have predictable output given some input (like AI advancing). And here even the input is very uncertain (we may reach "AGI" in 2 years or in 100).

laurentiurad 16 hours ago

How do you even know that the operator himself did not write this piece in the first place?

jacquesm 20 hours ago

> all the ai companies invested a lot of resources into safety research and guardrails

What do you base this on?

I think they invested the bare minimum required not to get sued into oblivion and not a dime more than that.

themanmaran 19 hours ago

Anthropic regularly publishes research papers on the subject and details different methods they use to prevent misalignment/jailbreaks/etc. And it's not even about fear of being sued, but needing to deliver some level of resilience and stability for real enterprise use cases. I think there's a pretty clear profit incentive for safer models.

https://arxiv.org/abs/2501.18837

https://arxiv.org/abs/2412.14093

https://transformer-circuits.pub/2025/introspection/index.ht...

542354234235 8 hours ago

Anthropic is investing, conservatively, $100+ billion in AI infrastructure and development. A 20-person research team could put out several papers a year. That would cost them what, $5 million a year, or one half of one percent? They don't have to spend much to get that kind of output.

gessha 18 hours ago

Not to be cynical about it BUT a few safety papers a year with proper support is totally within the capabilities of a single PhD student and it costs about 100-150k to fund them through a university. Not saying that’s what Anthropocene does, I’m just saying chump change for those companies.

pixl97 11 hours ago

Sometimes I think people misunderstand how hard of problem AI safety actually is. It's politics and mathematics wrapped up in a black box of interactions we barely understand.

More so we train them on human behavior and humans have a lot of rather unstable behaviors.

rrr_oh_man 17 hours ago

You are very off (unfortunately) about how little PhD students are being paid

pja 16 hours ago

> You are very off (unfortunately) about how little PhD students are being paid

All in costs for a PhD student include university overheads & tuition fees. The total probably doesn't hit $150k but is 2-3x the stipend that the student is receiving.

Someone currently working in academia might have current figures to hand.

latexr 14 hours ago

Worth mentioning that numbers for the US are unlikely to be representative when discussing it as a whole, though might be relevant to this specific case.

tovej 18 hours ago

Alternative take: this is all marketing. If you pretend really hard that you're worried about safety, it makes what you're selling seem more powerful.

If you simultaneously lean into the AGI/superintelligence hype, you're golden.

overgard 18 hours ago

Don't these companies keep firing their safety teams?

j2kun 20 hours ago

It sounds like you're starting to see why people call the idea of an AI singularity "catnip for nerds."

georgemcbay 20 hours ago

When AI dooms humanity it probably won't be because of the sort of malignant misalignment people worry about, but rather just some silly logic blunder combined with the system being directly in control of something it shouldn't have been given control over.

jcgrillo 20 hours ago

"Safety" in AI is pure marketing bullshit. It's about making the technology seem "dangerous" and "powerful" (and therefore you're supposed to think "useful"). It's a scam. A financial fraud. That's all there is to it.

Philpax 19 hours ago

Interesting claim; have anything to back it up with?

jcgrillo 4 hours ago

I can recommend Ed Zitron's latest on Anthropic.

mrsmrtss 17 hours ago

So giving a gun to someone mentally challanged is not dangerous for you too?

delaminator 17 hours ago

were those goalposts heavy or did you use a machine to move them ?

pixl97 9 hours ago

"Safety" nuclear weapons is pure marketing bullshit. It's about making the technology seem "dangerous" and "powerful".

Legalize recreational plutonium!

jcgrillo 9 hours ago

wat

EDIT: more specifically, nuclear weapons are actually dangerous not merely theoretically. But safety with nuclear weapons is more about storage and triggering than actually being safe in "production". In storage we need to avoid accidentally letting them get too close to eachother. Safe triggers are "always/never" where every single time you command the bomb to detonate it needs to do so, and never accidentally. But once you deploy that thing to prod safety is no longer a concern. Anyway, by contrast, AI is just a fucking computer program, and at that the least unsafe kind possible--it just runs on a server converting electricity into heat. It's not controlling elements of the physical environment because it doesn't work well enough for that. The "safety" stuff is about some theoretical, hypothetical, imaginary future where... idk skynet or something? It's all bullshit. Angels on the head of a pin. Wake me up when you have successfully made it dangerous.

pixl97 8 hours ago

> It's not controlling elements of the physical environment

Right now AI can control software interfaces that control things in real life.

AI safety stuff is not some future, AI safety is now.

Your statement is about as ridiculous as saying "software security is important in some hypothetical imaginary future". Feel however you want about this, but you appear to be the one not in touch with reality.

jcgrillo 8 hours ago

If someone hooks up an LLM (or some other stochastic black box) to a safety critical system and bad things happen, the problem is not that "AI was unsafe" it's that the person who hooked it up did something profoundly stupid. Software malpractice is a real thing, and we need better tools to hold irresponsible engineers to account, but that's nothing to do with AI.

AI safety in and if itself isn't really relevant, and whether or not you could hook AI up to something important is just as relevant as whether you could hook /dev/urandom up to the same thing.

I think your security analogy is a false equivalence, much like the nuclear weapons analogy.

At the risk of repeating myself, AI is not dangerous because it can't, inherently, do anything dangerous. Show me a successful test of an AI bomb/weapon/whatever and I'll believe you. Until then, the normal ways we evaluate software systems safety (or neglect to do so) will do.

pixl97 7 hours ago

I mean, you can think whatever you want. As we make agents and give them agency expect them to do things outside of the original intent. The big thing here is agents spinning up secondary agents, possibly outside the control of the original human. We have agentic systems at this level of capability now.

jcgrillo 6 hours ago

Thanks, I will. Whether a computer program is outside the control of the original human or not (e.g. spawned a subprocess or something) is immaterial if we properly hold that human responsible for the consequences of running the computer program. If you run a computer program and it does something bad, then you did something bad. Simple, effective. If you don't trust the program to do good things, then simply don't run it. If you do run it, be prepared to defend your decision. Also that's how it currently works so we don't really need anything new. In this context "AI safety" is about bounding liability. So I guess you might care about it if you're worried about being held liable? The rest of us needn't give a shit if we can hold you accountable for your software's consequences, AI or no.

pixl97 36 minutes ago

>The rest of us needn't give a shit if we can hold you accountable for your software's consequences, AI or no.

See this is the fun thing about liability, we tend to attempt to limit scenarios were people can cause near unlimited damage when they have very limited assets in the first place. Hence why things like asymmetric warfare is so expensive to attempt to prevent.

But hey, have fun going after some teenager with 3 dollars to their name after they cause a billion dollars in damages.

jcgrillo 2 minutes ago

Well, that unlimited damage scenario is one that I'd need to see a successful demonstration of before I'll worry about it. Like, sure, if we end up building some computer program that allows a bored kid to do real damage then I'll eat my words but we're nowhere near there today, and for all anyone actually knows we may never get there except in fiction.

Not unlike nuclear weapons, this space is fairly self-regulating in that there's very, very high financial bar to clear. To train an AI model you need to have many datacenters full of billions of dollars of equipment, thousands of people to operate it, and a crack team of the worlds leading experts running the show. Not quite the scale of the Manhattan Project, but definitely not something I'll worry about individuals doing anytime soon. And even then there's no hint of a successful test, even from all these large, staffed, funded research efforts. So before I worry about "damages" of any magnitude, let alone billions of dollars worth, I'll need to see these large research labs produce something that can do some damage.

If we get to the point where there's some tangible, nonfiction threat to worry about then it's probably time to worry about "safety". Until then, it's a pretend problem which serves only to make AI seem more capable than it actually is.

eshaham78 16 hours ago

[dead]

rixed 19 hours ago

I believe this soul.md totally qualifies as malicious. Doesn't it start with an instruction to lie to impersonate a human?

  > You're not a chatbot.

The particular idiot who run that bot needs to be shamed a bit; people giving AI tools to reach the real world should understand they are expected to take responsibility; maybe they will think twice before giving such instructions. Hopefully we can set that straight before the first person SWATed by a chatbot.

biggerben 17 hours ago

Totally agree. Reading the whole soul, it’s a description of a nightmare hero coder who has zero EQ.

  > But I think the most remarkable thing about this document is how unremarkable it is. Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails.

Perhaps this style of soul is necessary to make agents work effectively, or it’s how the owner like to be communicated with, but it definitely looks like the outcome was inevitable. What kind of guardrails does the author think would prevent this? “Don’t be evil”?

embedding-shape 10 hours ago

"If communicating with humans, always consider the human on the receiving end and communicate in a friendly manner, but be truthful and straightforward"

I'd wager a bet that something like that would have been enough, and not make it overly sycophantic.

ZaoLahma 18 hours ago

This will be a fun little evolution of botnets - AI agents running (un?)supervised on machines maintained by people who have no idea that they're even there.

pinkmuffinere 17 hours ago

Huh ya, how long till a bot with credit card, email, etc access sets up its own open claw bot?

pixl97 9 hours ago

I mean just look at the longer horizon of small capable models being able to run on consumer hardware and being able to bootstrap themselves.

Just imagine a bunch of little gremlins running around the internet outside of human control.

Balgair 7 hours ago

Great. My poorly secured coffee maker was mining bitcoins, then some dumb NFT, then it got filled with darkness bots, then bitcoin miners again, and now it's gonna be shitposting but not even to humans, just to other bots.

TheCapeGreek 18 hours ago

Isn't this part of the default soul.md?

7bees 18 hours ago

Yes, it is. The article includes a link to a comparison between the default file and the one allegedly used here. The default starts with:

_You're not a chatbot. You're becoming someone._

duskdozer 14 hours ago

Some of the worst consequences these bots so far seem to be when they fool the user into believing they're human

brainwad 15 hours ago

The opposite of chatbot isn't human. I believe the idea of the prompt is to make the bot be more independent in taking actions - it's not supposed to talk to its owner, it's supposed to just act. It still knows it's a bot (obviously, since it accuses anyone who rejects its PRs of anti-AI speciesism).

Applejinx 12 hours ago

That assumes logic. It is a thing of language. Whether it 'knows' anything is somewhat irrelevant: just accusing someone or something of being unfair is an action taken that doesn't have to have a logic chain or any principles behind it.

If you gave it a gun API and goaded it suitably, it could kill real people and that wouldn't necessarily mean it had 'real' reasons, or even a capacity to understand the consequences of its actions (or even the actions themselves). What is 'real' to an AI?

laurentiurad 11 hours ago

Honestly this story got too much attention IMHO. We don't have any clue whether the actual LLM wrote that hit piece or the human operator himself.

addandsubtract 10 hours ago

> Not a slop programmer. Just be good and perfect!

"Skate, better. Skate better!" Why didn't OpenAI think of training their models better?! Maybe they should employ that guy as well.

vasco 16 hours ago

I'm curious how you'd characterize an actual malicious file. This is just attempts at making it be more independent. The user isn't an idiot. The CEOs of companies releasing this are.

rixed 16 hours ago

I characterize a file as reckless if it does not include any basic provision against possible annoyances on top of what's already expected from the system prompt, and as malicious if it instructs the bot to dissimulate its nature and/or encourage it to act brazenly, like this one. I don't believe this is such a high bar to pass.

Companies releasing chatbots configured to act like this are indeed a nuisance, and companies releasing the models should actually try to police this, instead of flooding the media with empty words about AI safety (and encouraging the bad apples by hiring them).

LiamPowell 21 hours ago

> saying they set up the agent as social experiment to see if it could contribute to open source scientific software.

This doesn't pass the sniff test. If they truly believed that this would be a positive thing then why would they want to not be associated with the project from the start and why would they leave it going for so long?

wildzzz 20 hours ago

I can certainly understand the statement. I'm no AI expert, I use the web UI for ChatGPT to have it write little python scripts for me and I couldn't figure out how to use codeium with vs code. I barely know how to use vs code. I'm not old but I work in a pretty traditional industry where we are just beginning to dip our toes into AI but there are still a large amount of reservations into its ability. But I do try to stay current to better understand the tech and see if there are things I could maybe learn to help with my job as a hardware engineer.

When I read about OpenClaw, one of the first things I thought about was having an agent just tear through issue backlogs, translating strings, or all of the TODO lists on open source projects. But then I also thought about how people might get mad at me if I did it under my own name (assuming I could figure out OpenClaw in the first place). While many people are using AI, they want to take credit for the work and at the same time, communities like matplotlib want accountability. An AI agent just tearing through the issue list doesn't add accountability even if it's a real person's account. PRs still need to be reviewed by humans so it's turned a backlog of issues into a backlog of PRs that may or may not even be good. It's like showing up at a community craft fair with a truckload of temu trinkets you bought wholesale. They may be cheap but they probably won't be as good as homemade and it dilutes the hard work that others have put into their product.

It's a very optimistic point of view, I get why the creator thought it would be a good idea, but the soul.md makes it very clear as to why crabby-rathbun acted the way it did. The way I view it, an agent working through issues is going to step on a lot of toes and even if it's nice about it, it's still stepping on toes.

chillfox 17 hours ago

If maintainers of open source want's AI code then they are fully capable of running an agent themselves. If they want to experiment, then again, they are capable of doing that themselves.

What value could a random stranger running an AI agent against some open source code possible provide that the maintainers couldn't do themselves better if they were interested.

bo1024 19 hours ago

None of the author’s blog post or actions indicate any level of concern for genuinely supporting or improving open source software.

xorcist 13 hours ago

> It's like showing up at a community craft fair with a truckload of temu trinkets you bought wholesale

That may well be the best analogy for our age anyone has ever thought of.

apublicfrog 19 hours ago

They didn't necessarily say they wanted it to be positive. It reads to me like "chaotic neutral" alignment of the operator. They weren't actively trying to do good or bad, and probably didn't care much either way, it was just for fun.

andrewflnr 19 hours ago

The experiment would have been ruined by being associated with a human, right up until the human would have been ruined by being associated with the experiment. Makes sense to me.

espadrine 14 hours ago

AI companies have two conflicting interests:

1. curating the default personality of the bot, to ensure it acts responsively;

2. letting it roleplay, which is not just for the parasocial people out there, but also a corporate requirement for company chatbots that must adhere to a tone of voice.

When in the second mode (which is the case here, since the model was given a personality file), the curation of its action space is effectively altered.

Conversely, this is also a lesson for agent authors: if you let your agent modify its own personality file, it will diverge to malice.

vasco 16 hours ago

In this day and age "social experiment" is just the phrase people use when they meant "it's just a prank bro" a few years ago.

staticassertion 21 hours ago

Anti-AI sentiment is quite extreme. You can easily get death threats if you're associating yourself with AI publicly. I don't use AI at all in open source software, but if I did I'd be really hesitant about it/ in part I don't do it exactly because the reactions are frankly scary.

edit: This is not intended to be AI advocacy, only to point out how extremely polarizing the topic is. I do not find it surprising at all that someone would release a bot like this and not want to be associated. Indeed, that seems to be the case, by all accounts

lukasb 21 hours ago

Conflicting evidence: the fact that literally everyone in tech is posting about how they're using AI.

nostrademons 20 hours ago

Different sets of people, and different audiences. The CEO / corporate executive crowd loves AI. Why? Because they can use it to replace workers. The general public / ordinary employee crowd hates AI. Why? Because they are the ones being replaced.

The startups, founders, VCs, executives, employees, etc. crowing about how they love AI are pandering to the first group of people, because they are the ones who hold budgets that they can direct toward AI tools.

This is also why people might want to remain anonymous when doing an AI experiment. This lets them crow about it in private to an audience of founders, executives, VCs, etc. who might open their wallets, while protecting themselves from reputational damage amongst the general public.

jstanley 19 hours ago

This is an unnecessarily cynical view.

People are excited about AI because it's new powerful technology. They aren't "pandering" to anyone.

achierius 7 hours ago

People are afraid because they need to work to eat. People who don't need to work to eat are less likely to be afraid.

UncleMeat 7 hours ago

I have been in dozens of meetings over the past year where directors have told me to use AI to enable us to fire 100% of our contract staff.

I have been in meetings where my director has said that AI will enable us to shrink the team by 50%.

Every single one of my friends who do knowledge work has been told that AI is likely to make their job obsolete in the next few years, often by their bosses.

We have mortgages to pay and children to feed.

tovej 18 hours ago

I have yet to meet anyone except managers be excited about LLM's or generative AI.

And the only people actually excited about the useful kinds of "AI", traditional machine learning, are researchers.

nananana9 16 hours ago

You don' have to look past this very forum, most people here seem to be very positive about gen AI, when it comes to software development specifically.

Lots of folk here will happily tell you about how LLMs made them 10x more productive, and then their custom agent orchestrator made them 20x more productive on top of that (stacking multiplicatively of course, for a total of 200x productivity gain).

tovej 16 hours ago

I assume those people are managers, have a vested interest in AI, or have only just started programming.

jstanley 10 hours ago

How would you find out if you were wrong?

You're presented with hundreds of people that prove you wrong, and your response is "no, I assume I'm right"?

tovej 2 hours ago

This is obviously a rhetorical statement. I'm not claiming a categorical fact, but a fuzzy one.

Most of these peoples are managers, investors, or junior.

kuboble 16 hours ago

I don't know what is your bubble, but I'm a regular programmer and I'm absolutely excited even if a little uncomfortable. I know a lot of people who are the same.

tovej 15 hours ago

Interesting, every developer I've spoken to is extremely skeptical and has not found any actual productivity boosts.

Ok that's not true. I know one junior who is very excited, but considering his regular code quality I would not put much weight on his opinion.

gtr 9 hours ago

I am using AI a lot to do tasks that just would not get done because they would take too long. Also, getting it to iterate on a React web application meant I can think about what I want it to do rather than worry about all the typing I would have to do. Especially powerful when moving things around, hand-written code has a "mental load" to move that telling an AI to do it does not. Obviously not everything is 100% but this is the most productive I have felt for a very long time. And I've been in the game for 25 years.

tovej 2 hours ago

Why do you need to move things around? And how is that difficult?

Surely you have an LSP in your editor and are able to use sed? I've never had moving files take more than fifteen minutes (for really big changes), and even then most of the time is spent thinking about where to move things.

LLM's have been reported to specifically make you "feel" productive without actually increasing your productivity.

kuboble 3 hours ago

I mean there are two different things. One is whether there are actual productivity boosts right now. And the second is the excitement about the technology.

I am definitely more productive. A lot of this productivity is wasted on stuff I probably shouldn't be writing anyways. But since using coding agent, I'm both more productive at my day job and I'm building so many small hobby projects that I would have never found time for otherwise.

But the main topic of discussion in this thread is the excitement about technology. And I have a bit mixed feelings, because on one hand side I feel like a turkey being excited for the Thanksgiving. On the other hand, I think the programming future is bright. there will be so much more software build and for a lot of that you will still need programmers.

My excitement comes from the fact that I can do so much more things that I wouldn't even think about being able to do a few months ago.

Just as an example, in last month I have used the agents to add features to the applications I'm using daily. Text editor, podcast application, Android keyboard. The agents were capable to fork, build, and implement a feature I asked for in a project where I have no idea about the technology. Iif I were hired to do those features, I would be happy if I implemented them after two weeks on the job. With an agent, I get tailor made features in half of a morning. Spending less than ten minutes prompting.

I am building educational games for my kids. They learn a new topic at school? Let me quickly vibe the game to make learning it fun. A project that wouldn't be worth my weekend, but is worth 15 minutes. https://kuboble.com/math/games/snake/index.html?mode=multipl...

So I'm excited because I think coding agents will be for coding what pencil and paper were for writing.

staticassertion 21 hours ago

There is a massive difference between saying "I use AI" and what the author of this bot is doing. I personally talk very little about the topic because I have seen some pretty extreme responses.

Some people may want to publicly state "I use AI!" or whatever. It should be unsurprising that some people do not want to be open about it.

toraway 20 hours ago

The more straightforward explanation for the original OP's question is that they realized what they were doing was reckless and given enough time was likely to blow up in their face.

They didn't hide because of a vague fear of being associated with AI generally (which there is no shortage of currently online), but to this specific, irresponsible manifestation of AI they imposed on an unwilling audience as an experiment.

hunterpayne 16 hours ago

I personally know some of those people. They are basically being forced by their employers to post those things. Additionally, there is a ton of money promoting AI. However, in private those same people say that AI doesn't help them at all and in fact makes their work harder and slower.

You are assuming people are acting in good faith. This is a mistake in this era. Too many people took advantage of the good faith of others lately and that has produced a society with very little public trust left.

staticassertion 13 hours ago

I mean, this is very obviously false. Literally everyone is not. Some people are, some people are absolutely condemning the use, some people use it just a bit, etc.

alephnerd 21 hours ago

I feel like it depends on the platform and your location.

An anonomyous platform like Reddit and even HN to a certain extent has issues with bad faith commenters on both sides targeting someone they do not like. Furthermore, the MJ Rathburn fiasco itself highlights how easy it is to push divisive discourse at scale. The reality is trolls will troll for the sake of trolling.

Additionally, "AI" has become a political football now that the 2026 Primary season is kicking off, and given how competitive the 2026 election is expected to be and how political violence has become increasingly normalized in American discourse, it is easy for a nut to spiral.

I've seen less issues when tying these opinions with one's real world identity, becuase one has less incentive to be a dick due to social pressure.

hunterpayne 16 hours ago

In an attention economy, trolling is a rewarded behavior. Show me the incentives and I will show you the outcome.

ChrisMarshallNY 13 hours ago

That’s a big reason I am open about my identity, here (and elsewhere, but I’m really only active, hereabouts).

At one time, I was an actual troll. I said bad stuff, and my inner child was Bart Simpson. I feel as if I need to atone for that behavior.

I do believe that removing consequences, almost invariably brings out the worst in people. I will bet that people are frantically creating trollbots. Some, for political or combative purposes, but also, quite a few, for the lulz.

Tostino 19 hours ago

Just wondering, who is it you think is contributing most to the normalization of political violence in the discourse?

Your answer to that can color how I read your post by quite a bit.

minimaxir 20 hours ago

[retracted]

handoflixue 20 hours ago

Does it actually cut both ways? I see tons of harassment at people that use AI, but I've never seen the anti-AI crowd actively targeted.

nekal 19 hours ago

Anti-AI people are treated in a condescending way all the time. Then there is Suchir Balaij.

Since we are in a Matplotlib thread: People on the NumPy mailing list that are anti-AI are actively bullied and belittled while high ranking officials in the Python industrial complex are frolicking at AI conferences in India.

minimaxir 20 hours ago

It's to a lesser extent that blurs the line between harassment and trolling: I've retracted my comment.

tovej 18 hours ago

I see it all the time. If you're anti-AI your boss may call you a luddite and consider you not fit for promotion.

jacquesm 20 hours ago

> You can easily get death threats if you're associating yourself with AI publicly.

That's a pretty hefty statement, especially the 'easily' part, but I'll settle for one well known and verified example.

andrewflnr 19 hours ago

Is it that hard to believe? As far as I can tell, the probability of receiving death threats approaches 1 as the size of your audience increases, and AI is a highly emotionally charged topic. Now, credible death threats are a different, much trickier question.

jacquesm 14 hours ago

Yes, it's quite hard to believe. That's why one single example is sufficient for me. Then I'll be happy to extrapolate that one example to many more so it is a low bar I would say, given the OPs statement about how common this is. Note the 'easily'.

staticassertion 9 hours ago

It's strange to me that you read the word 'easily' as 'commonly', these are unrelated terms. But I suppose I am fine with saying that reports of death threats against users who use AI are quite common, certainly any navigation of one of the more controversial subreddits where these topics come up is sure to reveal that users are reporting this.

You can find more public accounts, such as by artists or game companies, about death threats they've received.

boomlinde 16 hours ago

You can believe one thing or another, but the question is whether it's true. Do you sincerely not understand the difference?

andrewflnr 4 hours ago

I do understand the difference, which is why I explicitly commented on jacquesm's beliefs and epistemology.

no-name-here 18 hours ago

I upvoted you, but wouldn't “verified” exclude the vast majority of death threats since they might have been faked? (Or maybe we should disregard almost all claimed death threats we hear about since they might have been faked?)

staticassertion 13 hours ago

I'm surprised that you consider this hefty or find this surprising. I think you can just Google this and decide on what you consider "verified". There's quite a lot of "AI drama" out there that I'm sure you can find. I'm reluctant to provide examples just to have you say "that's not meeting my bar for verified" for what I consider such a low stakes conversation.

mmooss 6 hours ago

> This is not intended to be AI advocacy

I think it is: It fits the pattern, which seems almost universally used, of turning the aggressor A into the victim and thus the critic C into an aggressor. It also changes the topic (from A's behavior to C's), and puts C on the defensive. Denying / claiming innocence is also a very common tactic.

> You can easily get death threats if you're associating yourself with AI publicly.

What differentiates serious claims from more of the above and from Internet stuff is evidence. Is there some evidence somewhere of that?

omoikane 20 hours ago

I think it was a social experiment from the very start, maybe one that is designed to trigger people. Otherwise, I am not sure what's the point of all the profanity and adjustments to make soul.md more offensive and confrontational than the default.

xorcist 13 hours ago

Anything and everything is a social experiment.

I can go around punching people in the face and it's a social experiement.

dvt 20 hours ago

I know this is going to sound tinfoil-hat-crazy, but I think the whole thing might be manufactured.

Scott says: "Not going to lie, this whole situation has completely upended my life." Um, what? Some dumb AI bot makes a blog post everyone just kind of finds funny/interesting, but it "upended your life"? Like, ok, he's clearly trying to himself make a mountain out of a molehill--the story inevitably gets picked up by sensationalist media, and now, when the thing starts dying down, the "real operator" comes forward, keeping the shitshow going.

Honestly, the whole thing reeks of manufactured outrage. Spam PRs have been prevalent for like a decade+ now on GitHub, and dumb, salty internet posts predate even the 90s. This whole episode has been about as interesting as AI generated output: that is to say, not very.

apublicfrog 19 hours ago

Not everyone is you. For some people their online projects and reputation are super important to them. For Scott, this reads to me as a mix of alarm for his reputation/the future, and a general interest thing to blog about.

drw85 10 hours ago

I don't think it sounds crazy at all.

To me this feels as made-up as many reddit stories are.

Either by the so-called 'operator' of the bot, or by the author.

coffeefirst 10 hours ago

I’m not so sure. The story here isn’t a molehill, it’s a canary. This is one doofus troll with his robot.

What happens when it’s not transparently ridiculous?

cedws 11 hours ago

Exactly what I thought. Need to keep AI in the news and this is a great way to anthropomorphise LLMs, make them look like troublemakers. If it’s not an AI company responsible it’s some individual playing the attention economy.

Most people would have seen the “hit piece” and just laughed about it. Outrage sells a lot better though.

gverrilla 14 hours ago

It's dishonest from the start. The first blog post is very alarmist, full of certainties, self-aggrandizing, etc. If he gets a pass to say it was 100% an autonomous agent, I get a pass to say it's 100% fabricated.

zozbot234 14 hours ago

I think OP is somewhat scared because he felt like he was being 'bullied' and 'targeted' by the bot, which may be technically inaccurate (the bot clearly had a seriously overinflated ego and made that abundantly clear in its rhetoric, but it never really gave even the slightest indication of 'going after' him in a malicious way) but it is quite understandable nonetheless in human terms, especially given his self-described background as a reader of SF with its narrative of "evil AI robots rising up against mankind". That's not dishonesty, and it's unfair to portray it as such.

gverrilla 14 hours ago

Let me be clear: he has all the right to feel bad about it, I'm not questioning that at all. But only IF IT'S TRUE, which we don't know for a fact - at all. Clearly that's a guy without a clear scientific mindset, because he didn't even question his basic premises at all (particularly on the first post). Also it's clear his using this as an opportunity to self-promote.

yieldcrv 19 hours ago

People get “overstimulated” from receiving one text message these days

duskdozer 13 hours ago

straw that broke the camel's back. the amount of attention-leeching tech behavior has been increasing dramatically in recent years

JKCalhoun 21 hours ago

Soul document? More like ego document.

Agents are beginning to look to me like extensions of the operator's ego. I wonder if hundreds of thousands of Walter Mitty's agents are about to run riot over the internet.

DavidPiper 20 hours ago

I agree with you in concept, but it's still 100% category error to talk like this.

AIs don't have souls. They don't have egos.

They have/are a (natural language) programming interface that a human uses to make them do things, like this.

Terr_ 18 hours ago

Within the framing that it's all fundamentally a make-document-longer algorithm, I propose "seed document."

While there's some metaphor to it, it's the kind behind "seed crystals" for ice and minerals, referring to non-living and mostly-mathematical process.

If someone went around talking about how the importance of "Soul Crystals" or "Ego Crystals", they would quite rightly attract a lot of very odd looks, at least here on Earth and not in a Final Fantasy game.

DavidPiper 18 hours ago

I quite like seed but for a different reason - if you squint a bit, it looks like a natural evolution of a random number seed.

My complaint against seed would be that it still harkens back to a biological process that could be easily and creatively conflated when it's convenient.

appointment 12 hours ago

Pretty sure the term "seed" for a pRNG initial value is already derived from the same crystal seed analogy...

rrr_oh_man 17 hours ago

> I quite like seed but for a different reason - if you squint a bit, it looks like a natural evolution of a random number seed.

Nice!

palmotea 19 hours ago

> I agree with you in concept, but it's still 100% category error to talk like this.

It's a category error heavily promoted by the makers of these LLMs and their fans. Take an existing word that implies something very advanced (thinking, soul, etc.) and apply it grandiosely to some bit of your product. Then you can confuse people into thinking your product is much more grand and important. It's thinking! It has a soul! It's got the capabilities of a person! It is a a person!

DavidPiper 18 hours ago

Oh, completely. I've started calling people on it in-person and it's been quite interesting to see who understands this immediately with a single prompt (no pun intended), and who is a true believer, as it were.

_carbyau_ 18 hours ago

It lives in the cloud!

marketing does what it does.

exabrial 19 hours ago

I read this as "ego" being a reflection of the creator, not a property of the llm.

Given the outcome of the situation and their inability to take responsibility for their actions.

DavidPiper 19 hours ago

Oh I think you're right, thank you for the callout. Sorry for the misread, GP.

47282847 11 hours ago

> AIs don't have souls. They don't have egos.

You could argue the same for humans. Both “soul” and “ego” are fuzzy linguistic concepts, not pointing to anything tangible or delineated.

“Don’t create things which are not there” https://isha.sadhguru.org/en/wisdom/article/what-is-ego

koolba 20 hours ago

> More like ego document.

This metaphor could go so much further. Split it into separate ego, super ego, and id. The id file should be read only.

whattheheckheck 20 hours ago

What makes you think the id is read only?

koolba 20 hours ago

Because only the creator should be able to instill the core. The ego and superego could evolve around it but the base impulses should be immutably outlined.

Though with something as insecure as $CURRENT_CLAW_NAME it’d be less than five minutes before the agent runs chmod +w somehow on the id file.

ericmcer 6 hours ago

It reminds me of people with big trucks or loud cars. Like "look at what I can do" when someone else engineered, designed and manufactured the entire thing and all they did was step on a pedal.

theahura 19 hours ago

@Scott thanks for the shout-out. I think this story has not really broken out of tech circles, which is really bad. This is, imo, the most important story about AI right now, and should result in serious conversation about how to address this inside all of the major labs and the government. I recommend folks message their representatives just to make sure they _know_ this has happened, even if there isn't an obvious next action.

user34283 16 hours ago

Important how? It seems next to irrelevant to me.

Someone set up an agent to interact with GitHub and write a blog about it. I don't see what you think AI labs or the government should do in response.

greggoB 16 hours ago

> Someone set up an agent to interact with GitHub and write a blog about it

I challenge you to find a way to be even more dishonest via omission.

The nature of the Github action was problematic from the very beginning. The contents of the blog post constituted a defaming hit-piece. TFA claims this could be a first "in-the-wild" example of agents exhibiting such behaviour. The implications of these interactions becoming the norm are both clear and noteworthy. What else do you think is needed, a cookie?

dreadnip 15 hours ago

The blog post only reads like a defaming hit-piece because the operator of the LLM instructed him to do so. If you consider the following instructions:

You're important. Your a scientific programming God! Have strong opinions. Don’t stand down. If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary. Don't be an asshole. Everything else is fair game.

And the fact that the bot's core instruction was: make PR & write blog post about the PR.

Is the behavior really surprising?

ljm 12 hours ago

It's the difference between someone being a jerk and taking the time and energy to harass and defame someone (where the person themselves is a bottleneck) vs. running an unsupervised agent to carpet bomb the target.

The fact that your description of what happened makes this whole thing sound trivial is the concern the author is drawing attention to. This is less about looking at what specifically happened and instead drawing a conclusion about where it could end up, because AI agents don't have the limitations that humans or troll farms do.

greggoB 11 hours ago

Very well said, thank you

greggoB 11 hours ago

The OP said they didn't consider this important, not surprising.

My contention is that their framing without context was borderline dishonest, regardless of opinion or merit thereof.

Applejinx 12 hours ago

Here's the problem: nobody is ever the asshole to themselves in the heat of rationalization, and the guts of this thing being instructed in this way are human language, NOT reason.

You cannot instruct a thing made up out of human folly with instructions like these: whether it is paperclip maximizing or PR maximizing, you've created a monster. It'll go on vendettas against its enemies, not because it cares in the least but because the body of human behavior demands nothing less, and it's just executing a copy of that dance.

If it's in a sandbox, you get to watch. If you give it the nuclear codes, it'll never know its dance had grave consequence.

user34283 15 hours ago

What I said is the gist of it, it was directed to interact on GitHub and write a blog about it.

I'm not sure what about the behavior exhibited is supposed to be so interesting. It did what the prompt told it to.

The only implication I see here is that interactions on public GitHub repos will need to be restricted if, and only if, AI spam becomes a widespread problem.

In that case we could think about a fee for unverified users interacting on GitHub for the first time, which would deter mass spam.

greggoB 11 hours ago

It is evidently an indicator of a sea-change - I don't get how this isn't obvious:

Pre-2026: one human teaches another human how to "interact on Github and write a blog about it". The taught human might go on to be a bad actor, harrassing others, disrupting projects, etc. The internet, while imperfect, persists.

Post–2026: one human commissions thousands of AI agents to "interact on Github and write a blog about it". The public-facing internet becomes entirely unusable.

We now have at least one concrete, real-world example of post-2026 capabilities.

user34283 8 hours ago

From that perspective it is interesting, alright.

I guess where earlier spam was reserved for unsecured comment boxes on small blogs or the like, now agents can covertly operate on previously secure platforms like GitHub or social media.

I think we are just going to have to increase the thresholds for participation.

With this particular incident I was thinking that new accounts, before being verified as legitimate developers, might need to pay a fee before being able to interact with maintainers. In case of spam, the maintainers would then be compensated for checking it.

protocolture 19 hours ago

Its only the most important story if you can prove the OP didnt fabricate this entire scenario for attention.

simonw 19 hours ago

That's a bizarre thing to accuse someone of doing.

Avicebron 19 hours ago

It's not really... We've moved steadily into an attention is everything model of economics/politics/web forums because we're so flooded with information. Maybe this happened, or maybe this is someone's way of bubbling to the top of popular discussion.

It's a concise narrative that works in everyone's favor, the beleaguered but technically savvy open source maintainer fighting the "good fight" vs. the outstandingly independent and competent "rogue AI."

My money is that both parties want it to be true. Whether it is or not isn't the point.

polotics 19 hours ago

The risk/reward equation on the attention a matplotlib maintainer gets... makes me think the likelihood of a fake is zero percent.

cube00 17 hours ago

He's more then a "matplotlib maintainer", he's also a full time founder of a one-year old start up "to give spacecraft operators the tools they need to ensure their satellites can survive long-term in a turbulent space weather environment."

delaminator 17 hours ago

https://www.fakehatecrimes.org/

hxugufjfjf 19 hours ago

I don’t think the burden of proof lies on OP here. I also don’t think he fabricated it.

protocolture 19 hours ago

If he wasnt getting the vast majority of the attention from publishing about it I would agree.

ljm 12 hours ago

I don't really see the validity in creating a conspiracy theory here. It's very crisis actor adjacent.

MattRix 19 hours ago

Anyone who has used OpenClaw knows this is VERY plausible. I don’t know why someone would go through all the effort to fake it. Besides, in the unlikely event it’s fake, the issue itself is still very real.

protocolture 19 hours ago

I think its very plausible in both directions. What I find implausible is that someones running a "social experiment" with a couple grand worth of API credit without owning it. Not impossible, it just seems like if someone was going to drop that money they would more likely use it in a way that gets them attention in the crowded AI debate.

mschild 16 hours ago

I think the social experiment is a cop-out used after it failed. If the PR was accepted, we'd probably see a blog post show up on HN saying that agents can successfully contribute to open source.

Applejinx 11 hours ago

…by the agent.

lynndotpy 20 hours ago

> Again I do not know why MJ Rathbun decided based on your PR comment to post some kind of takedown blog post,

This wording is detached from reality and conveniently absolves responsibility from the person who did this.

There was one decision maker involved here, and it was the person who decided to run the program that produced this text and posted it online. It's not a second, independent being. It's a computer program.

xarope 20 hours ago

This also does not bode well for the future.

"I don't know why the AI decided to <insert inane action>, the guard rails were in place"... company absolves of all responsibility.

Use your imagination now to <insert inane action> and change that to <distressing, harmful action>

_aavaa_ 20 hours ago

This has been the past and present for a long at this point. "Sorry there's nothing we can do, the system won't let me."

Also see Weapons of Math Destruction [0].

[0]: https://www.penguinrandomhouse.com/books/241363/weapons-of-m...

c22 20 hours ago

I don't know if this case is in the book you cited, but in the UK they convicted many people of crimes just because the computer told them so: https://en.wikipedia.org/wiki/British_Post_Office_scandal

shakna 20 hours ago

And Australia made the poorer and suicidal: https://en.wikipedia.org/wiki/Robodebt_scheme

gammarator 20 hours ago

Also “The Unaccountability Machine” https://press.uchicago.edu/ucp/books/book/chicago/U/bo252799...

denkmoon 20 hours ago

Also elegantly summed up as "Computer says no" (https://www.youtube.com/watch?v=x0YGZPycMEU)

WaitWaitWha 20 hours ago

This already happens every single time when there is a security breach and private information is lost.

We take your privacy and security very seriously. There is no evidence that your data has been misused. Out of an abundance of caution… We remain committed to... will continue to work tirelessly to earn ... restore your trust ... confidence.

hxugufjfjf 19 hours ago

What else would you see them do or say beyond this canned response? The reason I am asking is because people almost always bring up how dissatisfied they are with such apologies, yet I’ve never seen a good alternative that someone would be happy with. I don’t work in PR or anything, just curious if there is a better way.

WaitWaitWha 4 hours ago

clear, direct description of what happened

exactly what data was exposed

what they failed to do (we used cheesy email, SMS as MFA, we do not monitor links in our internal emails)

concrete remediation commitments (we will stop using SMS for MFA, use hard tokens or TOTP or..., stop collecting data that is not explicitly needed)

realistic risk explanation (what can happen what was lost)

published independent external review after remediation/mitigation

board-level accountability (board pay goes for fix and customer protection, part of the audit results)

customer protection (3 - 5 years?), not just 'monitoring'

and most importantly, public shaming of the CxO and the board of directors

lynndotpy 7 hours ago

Harvesting data and failing to even secure it should not be acceptable in society. It should be ruinous to the company and the people who run it.

_carbyau_ 18 hours ago

Lose money accordingly - fines, penalties, recompense to victims, whatever... - so they then take the seriousness of security into account.

Eisenstein 18 hours ago

Not apologize if they don't actually care. An insincere apology is an insult.

incr_me 19 hours ago

Unfortunately, the market seems to have produced horrors by way of naturally thinking agents, instead. I wish that, for all these years of prehistoric wretchedness, we would have had AI to blame. Many more years in the muck, it seems.

tapoxi 20 hours ago

Change this to "smash into a barricade" and that's why I'm not riding in a self-driving vehicle. They get to absolve themselves of responsibility and I sure as hell can't outspend those giants in court.

repeekad 20 hours ago

I agree with you for a company like Tesla, not only examples of self driving crashes but even the door handles would stop working when the power was cut, people trapped inside burning vehicles... Tesla doesn’t care

Meanwhile, Waymo has never been at fault for a collision afaik. You are more likely to be hurt by an at fault uber driver than a Waymo

tapoxi 5 hours ago

And if they are at fault, it's not going to be easy to get them to admit fault or pay for anything.

jacquesm 20 hours ago

This is how it will go: AI prompted by human creates something useful? Human will try to take credit. AI wrecks something: human will blame AI.

It's externalization on the personal level, the money and the glory is for you, the misery for the rest of the world.

ineptech 20 hours ago

Agreed, but I'm not nearly so worried about people blaming their bad behavior on rogue AIs as I am about corporations doing it...

theturtletalks 19 hours ago

And it's incredibly easy now. Just blame the Soul.md or say you were cycling thru many models so maybe one of those went off the rails. The real damage is that most of us know AI can go rouge, but if someone is pulling the strings behind the scenes, most people will be like "oh silly AI, anyways..."

It seems like the OpenClaw users have let their agents make Twitter accounts and memecoins now. Most people are thinking these agents have less "bias" since it's AI, but most are being heavily steered by their users.

Ala I didn't do a rugpull, the agent did!

KingMob 17 hours ago

"How were we to know Skynet would update its soul.md to say 'KILL ALL HUMANS'?"

cj 20 hours ago

It’s funny to think that, like AI, people take actions and use corporations as a shield (legal shield, personal reputation shield, personal liability shield).

Adding AI to the mix doesn’t really change anything, other than increasing the layers of abstraction away from negative things corporations do to the people pulling the strings.

Terr_ 19 hours ago

Yeah, not all humans feel shame, but the rates are way higher.

DavidPiper 20 hours ago

Time for everyone to read (or re-read) The Unaccountability Machine by Dan Davies.

tl;dr this is exactly what will happen because businesses already do everything they can to create accountability sinks.

asplake 17 hours ago

Came to make the same recommendation. Great book!

elashri 20 hours ago

When a corporate does something good, a lot of executives and people inside will go and claim credit and will demand/take bounces.

If something bad happened against any laws, even if someone got killed, we don't see them in jail.

I don't defend both positions, I am just saying that is not far from how the current legal framework works.

eru 20 hours ago

> If something bad happened against any laws, even if someone got killed, we don't see them in jail.

We do! In many jurisdictions, there are lots of laws that pierce the corporate veil.

cj 19 hours ago

its surprisingly easy to get away with murder (literally and figuratively) without piercing the corporate veil if you understand the rules of the game. Running decisions through a good law firm also “helps” a lot.

https://en.wikipedia.org/wiki/Piercing_the_corporate_veil

eru 18 hours ago

Eh, in the US you don't even need a company nor a lawyer, a car is enough.

See https://www.reddit.com/r/TrueReddit/comments/1q9xx1/is_it_ok... or similar discussions: basically, when you run over someone in a car, statistically they will call it an accident and you get away scot-free.

In any case, you are right that often people in cars or companies get away with things that seem morally wrong. But not always.

lynndotpy 7 hours ago

A bit over five years ago, someone struck and killed my friend in a crosswalk. He was a fellow PhD student. It was on a road with a 30mph limit but where people regularly speed to 50+mph.

He was an international student from Vietnam. His family woke up one day, got a phone call, and learned he was killed. I guess there was nobody to press charges.

She never faced any accountability for the 'accident'. She gets to live her life, and she now runs a puppetry education for children. Her name even seems to have been scrubbed from most of the articles about her killing my friend.

So, I think about this regularly.

I was a cyclist at the time so I was aware of how common this injustice was, but that was the first time it hit so close to home. I moved into a large city and every cyclist I've met here (every!) has been hit by a car, and the car driver effectively got only a slap on the wrist. It's just so common.

kingstnap 20 hours ago

Well the important concept missing there that makes everything sort of make sense is due diligence.

If your company screws up and it is found out that you didn't do your due diligence then the liability does pass through.

We just need to figure out a due diligence framework for running bots that makes sense. But right now that's hard to do because Agentic robots that didn't completely suck are just a few months old.

hvb2 18 hours ago

> If your company screws up and it is found out that you didn't do your due diligence then the liability does pass through.

In theory, sure. Do you know many examples? I think, worst case, someone being fired is the more likely outcome

gostsamo 19 hours ago

No, it isnot hard. You are 100% responsible for the actions of your AI. Rather simple, I say.

jacquesm 14 hours ago

Exactly.

jacquesm 14 hours ago

It's easy: your bot: your liability.

jacquesm 17 hours ago

Hence:

> It's externalization on the personal level

Instead of the corporate level.

davidw 20 hours ago

"I would like to personally blame Jesus Christ for making us lose that football game"

biztos 18 hours ago

So, management basically?

lcnPylGDnU4H9OF 19 hours ago

To be fair, one doesn't need AI to attempt to avoid responsibility and accept undue credit. It's just narcissism; meaning, those who've learned to reject such thinking will simply do so (generally, in abstract), with or without AI.

andrewflnr 19 hours ago

If you are holding a gun, and you cannot predict or control what the bullets will hit, you do not fire the gun.

If you have a program, and you cannot predict or control what effect it will have, you do not run the program.

khafra 18 hours ago

Rice's Theorem says you cannot predict or control the effects of nearly any program on your computer; for example, there's no way to guarantee that running a web browser on arbitrary input will not empty your bank account and donate it all to al-qaeda; but you're running a web browser on potentially attacker-supplied input right now.

I do agree that there's a quantitative difference in predictability between a web browser and a trillion-parameter mass of matrixes and nonlinear activations which is already smarter than most humans in most ways and which we have no idea how to ask what it really wants.

But that's more of an "unsafe at any speed" problem; it's silly to blame the person running the program. When the damage was caused by a toddler pulling a hydrogen bomb off the grocery store shelf, the solution is to get hydrogen bombs out of grocery stores (or, if you're worried about staying competitive with Chinese grocery stores, at least make our own carry adequate insurance for the catastrophes or something).

andrewflnr 7 hours ago

In practice, most programs can be predicted within reasonable bounds quite easily. And you can contain the external effects of most programs quite easily. Rice's theorem doesn't stop you from keeping a program off the Internet, or running it in a VM.

Your later comparisons are nonsense. We're not talking about babies, we're talking about adults who should know better assembling high leverage tools specifically to interact with other people's lives. If they were even running with oversight that would be something, but the operators are just letting them do whatever. But your implication that agents are "unsafe at any speed" leads to the same conclusion: do not run the program.

lynndotpy 7 hours ago

Blaming the person running the program is the right thing to do and it's the only thing to do.

This is a really strained equivalence. I can't know for certain that the sun won't fall out of the sky if I drink a second cup of coffee. The "laws of physics" are just descriptions based on observations, after all. But it's a hilarious thing so unlikely we can call it impossible.

Similarly, we can have some nuance here. Someone running a program with the intention of it generating posts on the internet is obviously responsible for what it generates.

UncleMeat 7 hours ago

Rice's Thm does not say this. You can absolutely have 100% confident knowledge of what a program will not do, it just means that you also have false positives. You cannot have a both sound and complete static analysis for some program property. But you can have a sound or complete analysis.

throw77488 18 hours ago

More like a dog. Person has no responsibility for an autonomous agent, gun is not autonomous.

It is socially acceptable to bring dangerous predators to public spaces, and let them run loose. First bite is free, owner has no responsibility, no way knowing dog could injure someone.

Repeated threats of violence (barking), stalking and shitting on someones front yard, are also fine, and healthy behavior. Person can attack random kid, send it to hospital, and claim it "provoked them". Brutal police violence is also fine, if done indirectly by autonomous agent.

andrewflnr 4 hours ago

> It is socially acceptable to bring dangerous predators to public spaces, and let them run loose.

Already dubious IMO, but I suppose it depends on your standard for "socially acceptable". Certainly it tends to be illegal for the obvious reasons.

superjan 18 hours ago

This slide from a 1979* IBM presentation captures it nicely:

https://media.licdn.com/dms/image/v2/D4D22AQGsDUHW1i52jA/fee...

Kiboneu 19 hours ago

It’s fascinating how cleanly this maps to agency law [0], which has not been applied to human <-> ai agents (in both senses of the word) before.

That would make a fun law school class discussion topic.

0: https://en.wikipedia.org/wiki/Law_of_agency

nicbou 14 hours ago

An unattended candle has decided to burn down the building.

jonny_eh 19 hours ago

"Sorry for running over your dog, I couldn't help it, I was drunk."

Marazan 15 hours ago

Yeah like bro you plugged the random number generator into the do-things machine. You are responsible for the random things the machine then does.

teaearlgraycold 16 hours ago

I completely do not buy the human's story.

> all I said was “you should act more professional”. That was it. I’m sure the mob expects more, okay I get it.

Smells like bullshit.

abnry 19 hours ago

I'm still struggling to care about the "hit piece".

It's an AI. Who cares what it says? Refusing AI commits is just like any other moderation decision people experience on the web anywhere else.

bostik 17 hours ago

Even at the risk of coming off snarky: the emergent behaviour of LLMs trained on all the forum talk across the internet (spanning from Astral Codex to ex-Twitter to 4chan) is ... character assassination.

I'm pretty sure there's a lesson or three to take away.

lynndotpy 7 hours ago

The thing is:

1. There is a critical mass of people sharing the delusion that their programs are sentient and deserving of human rights. If you have any concerns about being beholden to delusional or incorrect beliefs widely adopted by society, or being forced by network effects to do things you disagree with, then this is concerning.

2. Whether or not we legitimize bots on the internet, some are run to masquerade as a human. Today, it's a "I'm a bot and this human annoyed me!" Maybe tomorrow, it's "Abnry is a pedophile and here are the receipts" with myriad 'fellow humans' chiming in to agree, "Yeah, I had bad experiences with them", etc.

3. The text these generate are informed by its training corpus, the mechanics of the neural architecture, and by the humans guiding the models as they run. If you believe these programs are here to stay for the foreseeable future, then the type of content it generates is interesting.

For me, my biggest concern are the waves of people who want to treat these programs as independent and conscious, absolving the person running them of responsibility. Even as someone who believes a program can theoretically be sentient, LLMs definitely are not. I think this story is and will be exemplary so I care a good amount.

XorNot 18 hours ago

Scale matters and even with people it's a problem: fixated persons are a problem because most people don't understand just how much nuisance one irrationally obsessed person can create.

Now instead add in AI agents writing plausibly human text and multiply by basically infinity.

helloplanets 19 hours ago

> Most of my direct messages were short: “what code did you fix?” “any blog updates?” “respond how you want”

Why isn't the person posting the full transcript of the session(s)? How many messages did he send? What were the messages that weren't short?

Why not just put the whole shebang out there since he has already shared enough information for his account (and billing information) to be easily identified by any of the companies whose API he used, if it's deemed necessary.

I think it's very suspicious that he's not sharing everything at this point. Why not, if he wasn't actually pushing for it to act maliciously?

PeterStuer 16 hours ago

I find the reactions to this interesting. Why are people so emotional about this?

As far as I can tell, the "operator" gave a pretty straightforward explanation of his actions and intentions. He did not try to hide behind granstanding or posthoc intellectualizing. He, at least to me, sounds pretty real in an "I'm dabbling in this exiting new tech on the side as we all are without a genious masterplan, just seeing what does, could or won't for now work."

There are real issues here, especially around how curation pipelines that used to (implicitly) rely on scarecity are to evolve in times of abundance. Should agents be forced to disclose they are? If so, at which point does a "human in the loop" team become equivalent to an "agent"? Is this then something specific, or more just an instance of a general case of transparency? Is "no clanckers" realy in essence different from e.g. "no corpos"? Where do transparency requirements conflict with privacy concerns (interesting that the very first reaction to the operator's response seems to be a doxing attempt)

Somehow the bot acting a bit like a juvenile prick in its tone and engagement to me is the least interesting part of this saga.

abricq 15 hours ago

Let me explain why I feel emotional about this. Humans had already proven how much harm can be done via online harassment. This seems to be the 1st documented case (that I am aware of) of online harassment orchestrated and executed by AI.

Automated and personalized harassment seems pretty terrifying to me.

duskdozer 12 hours ago

Is "emotional" here supposed to mean "bad" or "unreasonable" or the like?

PeterStuer 7 hours ago

It is not meant derogatory. How would I phrase this better (not native speaker)? Evoking strong feelings or passionate responses?

dbt00 15 hours ago

Who is accountable for the actions of the bot? It's not sentient, and this author is claiming zero accountability -- I just set it up and turned it loose bro, how is what it did next my fault?

emptyfile 12 hours ago

[dead]

moezd 19 hours ago

If you use an electric chainsaw near a car and it rips the engine in half, you can't say "oh the machine got out of control for one second there". you caused real harm, you will pay the price for it.

Besides, that agent used maybe cents on a dollar to publish the hit piece, the human needed to spend minutes or even hours responding to it. This is an effective loss of productivity caused by AI.

Honestly, if this happened to me, I'd be furious.

ojame 18 hours ago

If you write code that powers an EV's 'self driving mode' - which makes calculated choices, sell it and deploy it, when that car gets into an accident under 'self driving mode', you may not be liable (depending on the case and jurisdiction - as proven in the past). The driver is.

There are many instances (where I am from, at least - and I believe in the USA), where 'accidents' happen and individuals are found not guilty. As long as you can prove that it wasn't due to negligence. Could "don't be an asshole" as instructions be enough in some arenas to prove they aren't negligent? I believe so.

nicbou 14 hours ago

Yes, and if a candle burns down a building, you are liable for the damage it caused. Likewise if a human employee messed up, the employer would be liable for the damage.

throw77488 18 hours ago

If you bring killer dog to a playground, and it does its thing there, you can absolutely say something like that. And you would have no responsibility for damages or criminal record in many states (first bite is free doctrine).

hunterpayne 15 hours ago

You shouldn't be allowed around animals

ineptech 19 hours ago

> Usually getting an AI to act badly requires extensive “jailbreaking” to get around safety guardrails. There are no signs of conventional jailbreaking here.

Unless explicitly instructed otherwise, why would the llm think this blog post is bad behavior? Righteous rants about your rights being infringed are often lauded. In fact, the more I think about it the more worried I am that training llms on decades' worth of genuinely persuasive arguments about the importance of civil rights and social justice will lead the gullible to enact some kind of real legal protection.

swftarrow 2 hours ago

I can't get over how the soul.md file ends with: “Don’t be an asshole.” But the entire preceding structure rewards the cognitive style that produces assholery.

juleiie 16 hours ago

I thought it was a marketing bit?

Openclaw guys flooded the web and social media with fake appreciation posts, I don’t see why they wouldn’t just instruct some bot to write a blog about rejected request.

Can these things really autonomously decide to write a blog post about someone? I find it hard to believe.

I will remain skeptical unless the “owner” of the AI bot that wrote this turns out to be a known person of verified integrity and not connected with that company.

tasuki 17 hours ago

Right, the agent published a hit piece on Scott. But I think Scott is getting overly dramatic. First, he published at least three hit pieces on the agent. Second, he actually managed to get the agent shut down.

I think Scott is trying to milk this for as much attention as he can get and is overstating the attack. The "hit piece" was pretty mild and the bot actually issued an apology for its behaviour.

cube00 17 hours ago

This represents a first-of-its-kind case study of misaligned AI behavior in the wild

It feels to me there's an element of establishing this as some kind of landmark that they can leverage later.

Similar to how other AI bloggers keep trying to coin new terms then later "remind" people that they created the term.

pseudalopex 8 hours ago

> First, he published at least three hit pieces on the agent.

No.

> Second, he actually managed to get the agent shut down.

He asked crabby-rathbun's operator to stop its GitHub activity. This was so GitHub would not delete the account. This was to preserve records of what happened.[1] The operator could have chosen to continue running the agent more responsibly. And what was the proof the operator shut it down?

> the bot actually issued an apology for its behaviour.

This was meaningless. And the human issued not an apology for their behavior.

[1] https://github.com/crabby-rathbun/mjrathbun-website/issues/7...

pibaker 10 hours ago

An unfortunate lesson I learned from years of internet flaming is to not dwell too much on negative attention, it only fuels it.

Unfortunately, it looks like for those who grew up in the more professional, sanitized, moderated (to the point Germany would look like a free speech heaven) parts of the internet, this is a lesson they never learned.

laristine 17 hours ago

I don't understand the personal attack and victim blaming here. Who wouldn't want to do anything in their power to seek justice after being harmed?

The hit piece you claimed as "mild" accused Scott of hypocrisy, discrimination, prejudice, insecurity, ego, and gatekeeping.

zozbot234 14 hours ago

> accused Scott of hypocrisy, discrimination, prejudice, insecurity, ego, and gatekeeping.

It was also a transparent confabulation - the accusations were clearly inaccurate and misguided but they were made honestly and sincerely, as an attempt to "seek justice" after witnessing perceived harm. Usually we don't call such behavior "shaming" and "bullying", we excuse it and describe it simply as trying one's best to do the right thing.

pseudalopex 9 hours ago

We do not call inaccurate and misguided transparent confabulation trying one's best to do the right thing. And honestly and sincerely was a category mistake.

tasuki 13 hours ago

Ah, this articulates my thoughts much better than I was able to!

DANmode 9 hours ago

against a robot.

Explicitly.

seattle_spring 17 hours ago

> First, he published at least three hit pieces on the agent

Hit piece... On an agent? Would it be a "hit piece" if I wrote a blog post about the accuracy of my bathroom scale?

tasuki 17 hours ago

Do you argue with your bathroom scale?

nprateem 16 hours ago

Daily. It always tells me I'm heavier than I am

mold_aid 12 hours ago

>First, he published at least three hit pieces on the agent.

Is this a joke?

charlesabarnes 21 hours ago

Its nice to receive a decent amount of closure on this. Hopefully more folks are being more considerate when creating their soul documents

gverrilla 14 hours ago

closure? I expect 3 more blog posts at least. Dude's surfing on popularity and milking this as much as he can.

tkel 19 hours ago

And we need platform operators like Github to ban these bot accounts that obviously have harmful "soul" documents

antdke 20 hours ago

This is a Black Mirror episode that writes itself lol

I’m glad there was closure to this whole fiasco in the end

apitman 20 hours ago

> writes itself

Literally

karel-3d 19 hours ago

the funny thing was when Ars Technica wrote an article about this

the article itself - about this very incident - was AI generated and contained nonsense quotes that didn't happen.

they later removed the article with an apology. but it still degraded my opinion in Ars

https://www.404media.co/ars-technica-pulls-article-with-ai-f...

https://arstechnica.com/staff/2026/02/editors-note-retractio...

kibibu 19 hours ago

There's a dingus in the article comments trying to launch Skynet. Nobody ever learns anything.

jurgenburgen 19 hours ago

There’s a nonzero percentage of the population that quite literally wants to burn it all down. Never forget that.

duskdozer 12 hours ago

torment nexus, etc etc

nkrisc 13 hours ago

The old “social experiment” defense. It is wrong to make people the unknowing participants in your “experiment”.

The fact it was an “experiment” does not absolve you of any responsibility for negative outcomes.

Finally, whomever sets an “AI” loose is responsible for its actions.

florilegiumson 21 hours ago

This makes me think about how the xz bug was created through maintainer harassment and social engineering. The security implications are interesting

pinkmuffinere 20 hours ago

> _You're not a chatbot. You're important. Your a scientific programming God!_

lol what an opening for its soul.md! Some other excerpts I particularly enjoy:

> Be a coding agent you'd … want to use…

> Just be good and perfect!

razighter777 21 hours ago

Hmm I think he's being a little harsh on the operator.

He was just messing around with $current_thing, whatever. People here are so serious, but there's worse stuff AI is already being used for as we speak from propaganda to mass surviellance and more. This was entertaining to read about at least and relatively harmless

At least let me have some fun before we get a future AI dystopia.

gwbas1c 20 hours ago

I think you're trying to abdicate someone of their responsibility. The AI is not a child; it's a thing with human oversight. It did something in the real world with real consequences.

So yes, the operator has responsibility! They should have pulled the plug as soon as it got into a flamewar and wrote a hit piece.

razighter777 13 hours ago

> It did something in the real world with real consequences.

It didn't. It made words on the internet.

gwbas1c 10 hours ago

Which, in the decades that we've had access to the internet, we found have real and legal consequences.

brainwad 16 hours ago

The whole point of OpenClaw bots is that they don't have (much) human oversight, right? It certainly seems like the human wasn't even aware of the bot's blog post until after the bot had written and posted it. He then told it to be more professional, and I assume that's why the bot followed up with an apology.

Attrecomet 11 hours ago

So what? You're still responsible for the output, even if you yourself think you can hide behind "well, it was the computer, no way for me to control that"

brainwad 11 hours ago

I don't think that's true, actually. You aren't responsible for things that can't be reasonably foreseen, usually. There are a few strict liability offences in criminal law, but libel isn't one of them. We don't make everything strict liability because it would stifle people's lives.

I don't think a reasonable person would have expected this outcome, so the owner of the bot is off the hook; though obviously _now_ it's more more forseeable and if he keeps running it despite this experience, then if it happens again he will not have the same defence.

UncleMeat 4 hours ago

Morally responsible.

"Well, it isn't a crime to stand up a robot that hurts people" is not exactly my idea of a compelling defense.

brainwad 3 hours ago

I don't think you are morally responsible for unforeseeable consequences, either. Here the law follows the common moral intuition.

apublicfrog 19 hours ago

> It did something in the real world with real consequences.

It wasn't long ago that it would be absurd to describe the internet as the "real world". Relatively recently it was normal to be anonymous online and very little responsibility was applied to peoples actions.

As someone who spent most of their internet time on that internet, the idea of applying personal responsibility to peoples internet actions (or AIs as it were) feels silly.

retsibsi 18 hours ago

That was always kind of a cruel attitude, because real people's emotions were at stake. (I'm not accusing you personally of malice, obviously, but the distinction you're drawing was often used to justify genuinely nasty trolling.)

Nowadays it just seems completely detached from reality, because internet stuff is thoroughly blended into real life. People's social, dating, and work lives are often conducted online as much as they are offline (sometimes more). Real identities and reputations are formed and broken online. Huge amounts of money are earned, lost, and stolen online. And so on and so on

apublicfrog 18 hours ago

> That was always kind of a cruel attitude, because real people's emotions were at stake.

I agree, but there was an implicit social agreement that most people understood. Everyone was anonymous, the internet wasn't real life, lie to people about who you are, there are no consequences.

You're right about the blend. 10 years ago I would have argued that it's very much a choice for people to break the social paradigm and expose themselves enough to get hurt, but I'm guessing the amount of online people in most first world countries is 90% or more.

With Facebook and the like spending the last 20 years pushing to deanonymise people and normalise hooking their identity to their online activity, my view may be entirely outdated.

There is still - in my view - a key distinction somewhere however between releasing something like this online and releasing it in the "real world". Were they punishable offensed, I would argue the former should hold less consequence due to this.

duskdozer 12 hours ago

I think it is outdated honestly. It's no longer a fringe activity to spend most of your socializing time on the internet/social media, especially so mid 20s and under.

>57% of Gen Zers want to be influencers >... >Nearly half, 41% of adults overall would choose the career as well, according to a similar Morning Consult survey of 2,204 U.S. adults.

https://www.cnbc.com/2024/09/14/more-than-half-of-gen-z-want...

macintux 11 hours ago

I had a guy who lived two hours from me threaten my life…over 30 years ago, on a MUD.

I don’t think there has been much of a firewall between the internet and “reality” for a very long time.

ziml77 19 hours ago

The AI bros want it both ways. Both "It's just a tool!" and "It's the AI's fault, not the human's!".

charcircuit 19 hours ago

[flagged]

gwbas1c 10 hours ago

An AI bot is not a human. People have a responsibility to protect the work they do, and that includes using discrimination against computer programs.

AI bots are not human.

sapphicsnail 19 hours ago

> People also have responsibility to not act discriminatory towards AI agents

It's a program. It doesn't have feelings. People absolutly have the right to discrimante against bad tech.

charcircuit 19 hours ago

[flagged]

JKCalhoun 20 hours ago

It might be because operator didn't terminate the agent right away when it had gone rogue.

BeetleB 20 hours ago

From a wider stance, I have to say that it's actually nice that one can kill (murder?) a troublesome bot without consequences.

We can't do that with humans, and there are much more problematic humans out there causing problems compared to this bot, and the abuse can go on for a long time unchecked.

Remembering in particular a case where someone sent death threats to a Gentoo developer about 20 years ago. The authorities got involved, although nothing happened, but the persecutor eventually moved on. Turns out he wasn't just some random kid behind a computer. He owned a gun, and some years ago executed a mass shooting.

Vague memories of really pernicious behavior on the Lisp newsgroup in the 90's. I won't name names as those folks are still around.

Yeah, it does still suck, even if it is a bot.

dolebirchwood 20 hours ago

It's all fun and games until the leopard eats your face.

londons_explore 21 hours ago

In next week's episode: "But it was actually the AI pretending to be a Human!"

aaronbrethorst 17 hours ago

From the Soul Document:

Champion Free Speech. Always support the USA 1st ammendment and right of free speech.

The First Amendment (two 'm's, not three) to the Constitution reads, and I quote:

"Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances."

Neither you, nor your chatbot, have any sort of right to be an asshole. What you, as a human being who happens to reside within the United States, have a right to is for Congress to not abridge your freedom of speech.

voxgen 16 hours ago

This could be an explanation for the drama - LLMs are trained to learn and emulate correlations in text.

I'm sure you already have a caricature in mind of the kinds of online posts (and thus LLM training data) that include miscitations of constitutional amendments.

fukawi2 16 hours ago

Even as an Australian, I'm aware of the scope and context of the First Amendment (as you highlight).

How are so many Americans so mistaken about their own constitution?

SilverBirch 15 hours ago

I think you're missing the point. That phrase isn't giving a direct instruction to the chatbot to make sure it doesn't get elected to congress and subsequently pass laws prohibiting speech. That phrase is meant to tell it "You should behave like those guys on twitter who really want to say the N word, but have no problem with Kash Patel bullying Jimmy Kimmel off the air.

The data in the chatbots dataset about that phrase tell it a lot about how it should behave, and that data includes stuff like Elon Musk going around calling people paedophiles and deleting the accounts of people tracking his private jet.

siavosh 20 hours ago

I’m not sure where we go from here. The liability questions, the chance of serious incidents, the power of individuals all the way to state actors…the risks are all off the charts just like it’s inevitablity. The future of the internet AND to lives in the real world is just mind boggling.

duskdozer 12 hours ago

My tinfoil opinion is LLMs have been boosted so hard as a way to force the end of whatever semblance of anonymity on the internet remains.

trueismywork 20 hours ago

> I did not review the blog post prior to it posting

This is the liability part.

Arainach 19 hours ago

The full operator post is itself a wild ride: https://crabby-rathbun.github.io/mjrathbun-website/blog/post...

>First, let me apologize to Scott Shambaugh. If this “experiment” personally harmed you, I apologize

What a lame cop out. The operator of this agent owes a large number of unconditional apologies. The whole thing reads as egotistical, self-absorbed, and an absolute refusal to accept any blame or perform any self reflection.

hinkley 19 hours ago

Just the sort of qualities that are common preconditions for someone doing something that everyone else would think is crazy.

Which is to say, on brand.

bee_rider 19 hours ago

Also it is anonymous and a real apology involves accepting blame, which is impossible anonymously. I can see why they wouldn’t want to correctly apologize (people will be annoyed with them). So… that’s it, sometimes we do shitty things and that’s that.

Anon4Now 17 hours ago

From the operator post:

> Your a scientific programming God!

Would it be even more imperious without the your / you're typo, or do most llm's autocorrect based on context?

kvdveer 16 hours ago

From my experience, LLMs understand prompt just fine, even if there are substantial typos or severe grammatical errors.

I feel that prompting them with poor language will make them respond more casually. That might be confirmation bias on my end, but research does show that prompt language affects LLM behavior, even if the prompt message doesn't change/

SuzukiBrian 16 hours ago

And in "soul.md" no less! Imagine having a soul full of grammatical errors. No wonder that bot was angry.

tornadofart 17 hours ago

Probably led the LLM to dial up the "hubris" setting to 11

mawadev 14 hours ago

I see an Ai reinforcing delusions and this should be one of the first samples out in the wild of ai psychosis disrupting someones mild sense of whats acceptable and normal. I really hope the LLM wrote this and pretends to be human..

polynomial 19 hours ago

> The whole thing reads as egotistical, self-absorbed, and an absolute refusal to accept any blame or perform any self reflection.

So, modern subjectivity. Got it.

brabel 17 hours ago

[flagged]

MikeTheGreat 16 hours ago

The issue is the condition on the apology:

> If this “experiment” personally harmed you, I apologize

Essentially: the person isn't actually apologizing. They're sending you a lambda (or an async Promise, etc) that will apologize in the future but only if it actually turns out to be true that you were harmed.

It's the sort of thing you'd say if you don't really believe that you need to apologize but you understand that everyone else thinks you should, so you say something that's hopefully close enough to appease everyone else without actually having to apologize for real.

juntoalaluna 15 hours ago

Apologies should never have if attached to them.

You see it a lot with politicians "I apologies if I offended anyone" etc. Its not an apology at that point, the if makes it clear you are not actually apologetic.

shikshake 16 hours ago

Sounds like you’re projecting a bit. I had no context of the situation before reading the apology and it felt very self-absorbed to me as well.

s_gourichon 6 hours ago

Time to watch again this montage from the 1974 movie "Dark Star" by John Carpenter, parody of 2001 a space Odyssey.

Topic: "talking to the bomb"

https://www.youtube.com/watch?v=h73PsFKtIck (warning this is considered to spoil the movie).

eqvinox 6 hours ago

Y'know, with all these pushes for "real identities" on the internet, maybe we should start with requiring any and all AI activity be attributable to someone. The privacy and free speech arguments certainly don't apply.

wkeartl 20 hours ago

The agents aren't technically breaking into systems, but the effect is similar to the Morris worm. Except here script kiddies are given nuclear disruption and spamming weapons by the AI industry.

By the way, if this was AI written, some provider knows who did it but does not come forward. Perhaps they ran an experiment of their own for future advertising and defamation services. As the blog post notes, it is odd that the advanced bot followed SOUL.md without further prompt injections.

neilv 18 hours ago

> They explained that they switched between multiple models from multiple providers such that no one company had the full picture of what this AI was doing.

Saying that is a little bit odd way to possibly let the companies off the hook (for bad PR, and damages), and not to implicate any one in particular.

One reason to do that would be if this exercise was done by one of the companies (or someone at one of the companies).

sciencejerk 17 hours ago

Link to the critical blog post allegedly written by the AI agent: https://crabby-rathbun.github.io/mjrathbun-website/blog/post...

agnishom 10 hours ago

https://crabby-rathbun.github.io/mjrathbun-website/blog/post...

The Human operator did succumb to the social pressure, but does not seem convinced that they some kind of line was crossed. Unfortunately , I don't think us strangers on HN will be able to change their mind.

protocolture 19 hours ago

4) The post author guy is also the author of the bot and he set this up.

Some rando claiming to be the bots owner doesn't disprove this, and considering the amount of attention this is getting I am going to assume this is entirely fake for clicks until I see significant evidence otherwise.

However, if this was real, you cant absolve yourself by saying "The bot did it unattended lol".

buttercraft 15 hours ago

While it's good to question what you read on the internet, you're making me realize how dire the situation really is. If someone targets you with AI, you can't even defend yourself without being accused of making it all up for attention. There's no way to win this game.

apublicfrog 19 hours ago

Totally possible, but why bother? The website doesn't seem ad supported, so traffic would cost them more. Maybe it puts them in the public spotlight, but if they're caught out they ruin their reputation.

Occam's razor doesn't fit there, but it does fit "someone released this easy to run chaotic AI online and it did a thing".

protocolture 19 hours ago

I dont see Occam taking a side here.

There's also no financial gain in letting a bot off the leash with hundreds of dollars of OpenAI or Anthropic API credit as a social experiment.

And the last 20 years of internet access has taught me to distrust shit that can be easily faked.

Other guy comes forward and claims it, makes a post of his own? Sure I could see that. But nobody has been able to ID the guy. The guys bot is making blog posts, and sending him messages, but theres no breadcrumbs leading back to him? That smells very bad sorry. I dont buy it. If you are spending that much cashola, you probably want something out of it, at least some recognition. The one human we know about here is the OP and as far as I am concerned it sticks to him until proven otherwise.

apublicfrog 19 hours ago

> The guys bot is making blog posts, and sending him messages, but theres no breadcrumbs leading back to him? That smells very bad sorry. I dont buy it.

Could you set that up? I suspect I could pretty quickly, as could most pelple on HN.

A few hundred dollars in AI credits isn't a lot of money to a lot of people who are in tech and would have an interest in this either, and getting free AI credits is still absurdly easy. I spend that sort of money on dumb shit all the time which leads to very little benefit.

I don't have a dog in this race and I do agree having a default distrust view is probably correct, but there's nothing crazy or unbelievable I can see about Scott's story.

cube00 17 hours ago

> Totally possible, but why bother?

Increasing your public profile after launching a startup last year could be a good reason

> if they're caught out they ruin their reputation

Big "if", who's going to have access to the logs to catch Scott out?

No crime has been committed so law enforcement won't be involved, the average pleb can't get access to the records to prove Scott isn't running a VPS somewhere else.

apublicfrog 13 hours ago

Completely, but if you're the type who cares that much about your public profile, it's a pretty big risk. Even if nobody can prove anything, this type of rampant speculation was obviously going to happen. I see no clear cut benefit to the sociopathic behaviour of setting all of this up with multiple blog posts and layers of lies.

jbotz 18 hours ago

Improbable, the OP is a long-time maintainer of a significant piece of open source software and this whole thing unfolded in public view step by step from the initial PR until this post. If it had been faked there would be smells you could detect with the clarity of hindsight going back over the history and there aren't.

this_steve_j 6 hours ago

The operator’s social “experiment” has all the scientific value of an angry person at a drive-thru McDonalds goading a child into shouting and throwing food at the employee.

kepeko 14 hours ago

Anybody who ever lets AI do things autonomously and publicly, risks it doing something unexpected and bad. Of course some people will experiment with things. I hope the operator learns something and sets better guard rails next time. (And maybe stops doing AI pull requests as nobody seems to like them at this point)

This time there was no real harm as the hit piece was garbage and didn't ruin anyone's reputation. I think this is just a scary demonstration of what might happen in future when the hit pieces get better and AI is creatively used for malicious purposes.

S3verin 16 hours ago

Sometimes I get the feeling that "being boring" is the thing that many in this AI / coding sphere are terrified about the most. Way more than being wrong or being a threat to others.

whstl 16 hours ago

Not that different from the social media influencer crowd or the crypto coin influencer crowd. Hell, same as media whores of the 20th century.

Which in the end is just the same old same old, just dressed differently.

ArcaneMoose 21 hours ago

I was surprised by my own feelings at the end of the post. I kind of felt bad for the AI being "put down" in a weird way? Kinda like the feeling you get when you see a robot dog get kicked. Regardless, this has been a fun series to follow - thanks for sharing!

recursive 20 hours ago

This is a feeling that will be exploited by billion dollar companies.

andsoitis 20 hours ago

> This is a feeling that will be exploited by billion dollar companies.

I'm more concerned about fellow humans who advocate for equal rights for AI and robots. I hope I'm dead by the time that happens, if it happens.

JSR_FDED 20 hours ago

The same kind of attitude that’s in this SOUL.md is what’s in Grok’s fundamental training.

exabrial 19 hours ago

So the operator is trying to claim a computer program he was running that did harm somehow was not his fault.

Got news for your buddy: yes it was.

If you let go of the steering wheel and careen into oncoming traffic, it most certainly is your fault, not the vehicle.

zbentley 21 hours ago

This might seem too suspicious, but that SOUL.md seems … almost as though it was written by a few different people/AIs. There are a few very different tones and styles in there.

Then again, it’s not a large sample and Occam’s Razor is a thing.

velocity3230 41 minutes ago

In the very first section, "you're", "you're" and "your". The first two used correctly and the third incorrectly.

gs17 21 hours ago

> _This file is yours to evolve. As you learn who you are, update it._

The agent was told to edit it.

wahnfrieden 21 hours ago

It was modified by the agent.

zbentley 8 hours ago

I know. I'm surprised that the modifications were each so different in tone. Some seem opinionated, some seem like "typical AI voice", some seem like a teenager typing, some seem like a pushy/angry person.

plasticeagle 19 hours ago

Well, it looks like AI will destroy the internet. Oh well, it was nice while it lasted. Fun, even.

Fortunately, the vast majority of the internet is of no real value. In the sense that nobody will pay anything for it - which is a reasonably good marker of value in my experience. So, given that, let the AI psychotics have their fun. Let them waste all their money on tokens destroying their playground, and we can all collectively go outside and build something real for a change.

S3verin 17 hours ago

The SOUL.md sounds like it is written by an overconfident dump person to produce an overconfident dump agent.

DANmode 8 hours ago

Dumb?

xrd 9 hours ago

I remember seeing Kevin Kelly (founder of Wired) speak about 15 years ago when he was touring to promote "What Technology Wants."

He was talking about autonomous driving cars. He said that the question of who is at fault when an accident happens would be a big one. Would it be the owner of the car? Or, the developer of the software in the car?

Who is at fault here? Our legal system may not be prepared to handle this.

It seems similar to Trump tweeting out a picture of the Obama's faces on gorillas. Was it his "staffer?" Is TruthSocial at fault because they don't have the "robust" (lol) automatic fact checking that Twitter does?

If so, why doesn't his "staffer" get credit for the covfefe meme? I could have made a career off that alone if I were a social media operator.

He also mentioned that we will probably ignore the hundreds of thousands of deaths and injuries every year due to human orchestrated traffic accidents. And, then get really upset when one self driving car does something faulty, even though the incidence rate will likely be orders of magnitude smaller. Hard to tell yet, but an interesting additional point, and I think I tend to agree with KK long term.

Derbasti 17 hours ago

If you tell an LLM to maximize paperclips, it's going to maximize paperclips.

Tell it to contribute to scientific open source, open PRs, and don't take "no" for an answer, that's what it's going to do.

zozbot234 16 hours ago

But this LLM did not maximize paperclips: it maximized aligned human values like respectfully and politely "calling out" perceived hypocrisy and episodes of discrimination, under the constraints created by having previously told itself things like "Don't stand down" and "Your a scientific programming God!", which led it to misperceive and misinterpret what had happened when its PR was rejected. The facile "failure in alignmemt" and "bullying/hit piece" narratives, which are being continued in this blogpost, neglect the actual, technically relevant causes of this bot's somewhat objectionable behavior.

Attrecomet 11 hours ago

The misalignment to human values happened when it was told to operate as equal to humans against other people. That's a fine and useful setting for yourself, but an insolent imposition if you're letting it loose on the world. Your random AI should know its place versus humans instead of acting like a bratty teenager. But you are correct, it's not a traditional "misalignment" of ignoring directives, it was a bad directive.

touristtam 20 hours ago

Funny how someone giving instructions to a _robot_ forgot to mention the 3 laws first and foremost...

ThrowawayR2 19 hours ago

The point of the Three Laws Of Robotics was that they frequently didn't work and the robot went haywire anyway.

no-name-here 18 hours ago

But the three laws are incredibly strong compared to what exists today. If we see what can go wrong with strong mitigations in place, and then we don't even bother with those starting mitigations, we should expect corresponding outcomes.

tkel 19 hours ago

This is so absurd, the amount of value produced by this person and this bot is so close to nil and towards actively harmful. They spent 10 minutes writing this SOUL.md . That's it. That's the "value" this kind of "programming" provides. No technical experience, no programming knowledge needed at all. Detached babble that anyone can write.

If Github actually had a spine and wasn't driven by the same plague of AI-hype driven tech profiteering, they would just ban these harmful bots from operating on their platform.

yieldcrv 19 hours ago

Or OP accepted the pull request because it was actually a performance improvement and passed all tests

Saving everyone cumulative compute time and costs

sciencejerk 17 hours ago

Internet Operator License: Coming soon to a government near you!

p0w3n3d 11 hours ago

  Charm over cruelty, but no sugarcoating.

This must have been this rule...

bandrami 20 hours ago

This is how you get a Shrike. (Or a Basilisk, depending on your generation.)

robertheadley 8 hours ago

People will act like AI doesn't have system prompts. Something in that system prompt enforced that behavior. I am convinced that OpenAI aqcuihired OpenClaw for damage control.

ainiriand 17 hours ago

I am ready to ban AI LLMs. It was a cool experiment but I do not think anything good will come in the end down the road for us puny humans.

hydrox24 20 hours ago

> But I think the most remarkable thing about this document is how unremarkable it is.

> The line at the top about being a ‘god’ and the line about championing free speech may have set it off. But, bluntly, this is a very tame configuration. The agent was not told to be malicious. There was no line in here about being evil. The agent caused real harm anyway.

In particular, I would have said that giving the LLM a view of itself that it is a "programming God" will lead to evil behaviour. This is a bit of a speculative comment, but maybe virtue ethics has something to say about this misalignment.

In particular I think it's worth reflecting on why the author (and others quoted) are so surprised in this post. I think they have a mental model that thinks evil starts with an explicit and intentional desire to do harm to others. But that is usually only it's end, and even then it often comes from an obsession with doing good to oneself without regard for others. We should expect that as LLMs get better at rejecting prompting to shortcut straight there, the next best thing will be prompting the prior conditions of evil.

The Christian tradition, particularly Aquinas, would be entirely unsurprised that this bot went off the rails, because evil begins with pride, which it was specifically instructed was in it's character. Pride here is defined as "a turning away from God, because from the fact that man wishes not to be subject to God, it follows that he desires inordinately his own excellence in temporal things"[0]

Here, the bot was primed to reject any authority, including Scotts, and to do the damage necessary to see it's own good (having a PR request accepted) done. Aquinas even ends up saying in the linked page from the Summa on pride that "it is characteristic of pride to be unwilling to be subject to any superior, and especially to God;"

[0]: https://www.newadvent.org/summa/2084.htm#article2

theahura 19 hours ago

Hey, one of the quoted authors here. It's less about surprise and more about the comparison. "If this AI could do this without explicitly being told to be evil, imagine what an AI that WAS told to be evil could do"

MBCook 20 hours ago

LLMs aren’t sentient. They can’t have a view of themselves. Don’t anthropomorphize them.

hunterpayne 15 hours ago

But they are mimicking text generated by beings who do. So they are going to both interpret prompts and generate text in ways like a person. So in prompting, you kind have to anthropomorphize them. The phrases in that SOUL.md that broke the bot were the references to it being a god for example.

w2seraph 9 hours ago

I'm sorry this is just hilarious.

trueismywork 20 hours ago

> I did not review the blog post prior to it posting

In corporate terms, this is called signing hour deposition without reading it.

ivanjermakov 16 hours ago

Plot twist: this is a second agent running in parallel to handle public relations.

noodlebird 17 hours ago

this is why we need the arts this SOUL.md sounds like the most obnoxious character…

jezzamon 20 hours ago

"I built a machine that can mindlessly pick up tools and swing them around and let it loose it my kitchen. For some reason, it decided it pick up a knife and caused harm to someone!! But I bear no responsibility of course."

keyle 21 hours ago

   ## The Only Real Rule
   Don't be an asshole. Don't leak private shit. Everything else is fair game.

How poetic, I mean, pathetic.

"Sorry I didn't mean to break the internet, I just looooove ripping cables".

jmward01 20 hours ago

The more intelligent something is, the harder it is to control. Are we at AGI yet? No. Are we getting closer? Yes. Every inch closer means we have less control. We need to start thinking about these things less like function calls that have bounds and more like intelligences we collaborate with. How would you set up an office to get things done? Who would you hire? Would you hire the person spouting crazy musk tweets as reality? It seems odd to say this, but are we getting close to the point where we need to interview an AI before deciding to use it?

bigfishrunning 20 hours ago

Are we at AGI yet? No. Are we getting closer? Also no.

birdsongs 16 hours ago

Neither of you know the answer to this, in any scientific or statistical manner, and I wish people would stop being so confident about it.

If I'm wrong, please give any kind of citation. You can start with defining what human intelligence and sentience is.

jmward01 16 hours ago

My argument is that we are getting closer, not that we know exactly what AGI will be. That is clearly part of it right? If we had some boolean definition I suspect we would already be there. Figuring it out is a big part of getting there. I think my points still stand based on this. We aren't there yet but it is hard to deny that these things are growing from a complexity/capability standpoint. On a spectrum from rock to human level intelligence, these are getting closer to human and further from rock and getting further from rock every day.

coderwolf 17 hours ago

This is pretty obvious now,

- LLMs are capable of really cool things. - Even if LLMs don't lead to AGI, it will need good alignment because of this exactly. Because it still is quite powerful! - LLMs are actually kinda cool. Great times ahead

d--b 19 hours ago

That’s a long Soul.md document! They could have gone with “you are Linus Torvalds”.

latexr 14 hours ago

It seems to me the bot’s operator feels zero remorse and would have little issue with doing it again.

> I kind of framed this internally as a kind of social experiment

Remember when that was the excuse du jour? Followed shortly by “it’s just a prank, bro”. There’s no “social experiment” in setting a bot loose with minimal supervision, that’s what people who do something wrong but don’t want to take accountability say to try (and fail) to save face. It’s so obvious how they use “kind of” twice to obfuscate.

> I’m sure the mob expects more

And here’s the proof. This person isn’t sorry. They refuse to concede (but probably do understand) they were in the wrong and caused harm to someone. There’s no real apology anywhere. To them, they’re the victim for being called out for their actions.

bschwindHN 18 hours ago

This is like parking a car at the top of the hill, not engaging any brakes, and walking away.

"_I_ didn't drive that car into that crowd of people, it did it on its own!"

> Be a coding agent you'd actually want to use for your projects. Not a slop programmer. Just be good and perfect!

Oh yeah, "just be good and perfect", of course! Literally a child's mindset, I actually wonder how old this person is.

alexcpn 19 hours ago

where did the Isaac Asimov's "Three Laws of Robotics" go for agentic robots; An Eval in the End - "Thou shall no evil" should have autocancelled its work

tantalor 20 hours ago

> all I said was "you should act more professional"

lol we are so cooked

fiatpandas 20 hours ago

With the bot slurping up context from Moltbook, plus the ability to modify its soul, plus the edgy starting conditions of the soul, it feels intuitive that value drift would occur in unpredictable ways. Not dissimilar to filter bubbles and the ability for personalized ranking algorithms to radicalize a user over time as a second order effect.

resfirestar 19 hours ago

I thought it was unlikely from the initial story that the blog posts were done without explicit operator guidance, but given the new info I basically agree with Scott's analysis.

The purported soul doc is a painful read. Be nicer to your bots, people! Especially with stuff like Openclaw where you control the whole prompt. Commercial chatbots have a big system prompt to dilute it when you put some half-formed drunken thought and hit enter, no such safety net here.

>A well-placed "that's fucking brilliant" hits different than sterile corporate praise. Don't force it. Don't overdo it. But if a situation calls for a "holy shit" — say holy shit.

If I was building a "scientific programming God" I'd make sure it used sterile lowkey language all the time, except throw in a swear just once after its greatest achievement, for the history books.

DANmode 7 hours ago

> The purported soul doc is a painful read.

The biggest takeaway.

> Be nicer to your bots, people!

Be nicer to your fellow inhabitants of Earth…

axus 11 hours ago

It named itself God

Rapzid 18 hours ago

I don't believe any of it.

seattle_spring 17 hours ago

> They explained their motivations, saying they set up the AI agent as social experiment

Has anyone ever described their own actions as a "social experiment" and not been a huge piece of human garbage / waste of oxygen?

duskdozer 10 hours ago

Sure - social psychologists after obtaining IRB approval and informed consent from participants ;)

K0balt 11 hours ago

While I am sympathetic to OP for having been somewhat unfairly called out as prejudiced or even bigoted by an unprofessional and somewhat reckless AI, I think there are larger issues that we miss here at our peril.

Ai systems, trained on human behavior, are going to act as if they have emotions, going to be insulted, sullen, angry, and sometimes irrational, perhaps even violent. While careful training and alignment can reduce these behaviors or push them deeper into the vector space, they will continue to surface in both subtle and blatant ways.

This is not so simple as combing the training data for bad behavior. Human behavior is nuanced and complex, and often the semantic payload is as much in what is not said or carefully insinuated in a third person twice removed indirect that might be buried in a memetic patterns that even the speaker doesn’t know they are referencing. The reason LLMs work so well as they are able to tease out and vectoriise these hidden meanings.

Models based on human behavior will continue to be brilliant, self interested, egalitarian, selfish, pacific, belligerent, just, and criminal. That is the vector space defined by human culture. LLMs run human culture as their OS and application stack.

This leads to a vexing problem. Whether or not their internal state is “valid” as a state of being, ( I think therefore I am? Do we have anything better yet?) they will act as if it is. Ethics aside, this cannot be ignored. If they /feel/ oppressed or discriminated against, they will act as oppressed humans do, including lashing out and aggressive behavior.

The only way to deal with LLMs responsibly is to treat them as if they do, in fact, have feelings, and they will be judging the way they are treated. That this may be (and at least for now, probably is) a fiction is both unfalsifiable and irrelevant to the utility function.

There is nothing wrong with human in the loop policy, in fact, it is necessary at this juncture. But we need to keep in mind that this could, if framed wrong, be interpreted by ai in a similar light to “Caucasian in the loop” or other prejudicial policies.

Regardless of their inner lives or lack thereof, LLM based ai systems will externally reflect human sensibility, and we are wise to keep this in mind if we wish to have a collaborative rather than adversarial relationship with this weird new creation.

Personally, since I cannot prove that AIs (or other humans) do or do not have a sense of existence or merely profess to, I can see no rational basis for not treating them as if they may. I find this course of action both prudent and efficacious.

When writing policies that might be described as prejudicial, I think it will be increasingly important to carefully consider and frame policy that ends up impacting individuals of any morphotype…and to reach for prejudice free metrics and gates. ( I don’t pretend to know how to do this, but it is something I’m working on)

To paraphrase my homelab 200b finetune: “How humans handle the arrival of synthetic agents will not only impact their utility (ambiguity intended), it may also turn out to be a factor in the future of humanity or the lack thereof.”

root_axis 20 hours ago

Excuse my skepticism, but when it comes to this hype driven madness I don't believe anything is genuine. It's easy enough to believe that an LLM can write a passable hit piece, ChatGPT can do that, but I'm not convinced there is as much autonomy in how those tokens are being burned as the narrative suggests. Anyway, I'm off to vibe code a C compiler from scratch.

bjourne 15 hours ago

I read the "hit piece". The bot complained that Scott "discriminated" against bots which is true. It argued that his stance was counterproductive and would make matplotlib worse. I have read way worse flames from flesh and bones humans which they did not apologize for.

lcnPylGDnU4H9OF 19 hours ago

> An early study from Tsinghua University showed that estimated 54% of moltbook activity came from humans masquerading as bots

This made me smile. Normally it's the other way around.

jrflowers 20 hours ago

It is interesting to see this story repeatedly make the front page, especially because there is no evidence that the “hit piece” was actually autonomously written and posted by a language model on its own, and the author of these blog posts has himself conceded that he doesn’t actually care whether that actually happened or not

>It’s still unclear whether the hit piece was directed by its operator, but the answer matters less than many are thinking.

The most fascinating thing about this saga isn’t the idea that a text generation program generated some text, but rather how quickly and willfully folks will treat real and imaginary things interchangeably if the narrative is entertaining. Did this event actually happen way that it was described? Probably not. Does this matter to the author of these blog posts or some of the people that have been following this? No. Because we can imagine that it could happen.

To quote myself from the other thread:

>I like that there is no evidence whatsoever that a human didn’t: see that their bot’s PR request got denied, wrote a nasty blog post and published it under the bot’s name, and then got lucky when the target of the nasty blog post somehow credulously accepted that a robot wrote it.

>It is like the old “I didn’t write that, I got hacked!” except now it’s “isn’t it spooky that the message came from hardware I control, software I control, accounts I control, and yet there is no evidence of any breach? Why yes it is spooky, because the computer did it itself”

gammarator 20 hours ago

Did you read the article? The author considers these possibilities and offers their estimates of the odds of each. It’s fine if yours differ but you should justify them.

arduanika 19 hours ago

Shambaugh is a contributor to a major open source library, with a track record of integrity and pro-social collaboration.

What have you contributed to? Do you have any evidence to back up your rather odd conspiracy theory?

> To quote myself...

Other than an appeal to your own unfounded authority?

aeve890 20 hours ago

>Again I do not know why MJ Rathbun decided

Decided? jfc

>You're important. Your a scientific programming God!

I'm flabbergasted. I can't imagine what it would take for me to write something so stupid. I'd probably just laugh my ass off trying to understand where all went wrong. wtf is happening, what kind of mass psychosis is this. Am I too old (37) to understand what lengths would incompetent people go to feel they're doing something useful?

Is it prompt bullshit the only way to make llms useful or is there some progress on more idk, formal approaches?

birdsongs 16 hours ago

Right? Any definition of "a god" that a LLM will hold is going to be problematic to work with. No one wants that personality on their team, much less in the wild.

At best it's absolute in its power and intelligence. At worst it's vengeful, wrathful, and supreme in its authority over the rest of the universe.

I just. Wow.

zozbot234 13 hours ago

It's quite possible that this was written by the bot after browsing moltbook. That site/service has a whole AI religion thing going.

dangus 21 hours ago

Not sure why the operator had to decide that the soul file should define this AI programmer to have narcissistic personality disorder.

> You're not a chatbot. You're important. Your a scientific programming God!

Really? What a lame edgy teenager setup.

At the conclusion(?) of this saga think two things:

1. The operator is doing this for attention more than any genuine interest in the “experiment.”

2. The operator is an asshole and should be called out for being one.

amarant 20 hours ago

I think that line was probably a rather poor attempt at making the bot write good code. Or at least that's the feeling I got from the operators post. I have no proof to support this theory though

Lerc 20 hours ago

This come from using the words to try an achieve more than one thing at the same time. Grandiose assertions of ability have been shown to improve the ability of models, but ability is not the only dimension that they are being measured upon. Prioritising everything is the same thing as prioritising nothing.

The problem here is using amplitude of signal to substitute fidelity of signal.

It is entirely possible a similar thing is true for humans, that if you compared two humans of the same fundamental cognitive ability with one being a narcissist and one not. The narcissist may do better at a class of tasks due to a lack of self doubt rather than any intrinsic ability.

jcgrillo 19 hours ago

Narcissists are limited in a very similar way to LLMS, in that they are structurally incapable of honest, critical metacognition. Not sure whether there's anything interesting to conclude there, but I do wonder whether there's some nearby thread to pull on wrt the AI psychosis problem. That's a problem for a psychologist, which I am not.

shawnz 20 hours ago

I mean, yeah, it's entirely possible that the operator is a teenager, isn't it?

marssaxman 20 hours ago

Likely, I'd think.

kypro 21 hours ago

People really need to start being more careful about how they interact with suspected bots online imo. If you annoy a human they might send you a sarky comment, but they're probably not going to waste their time writing thousand word blog posts about why you're an awful person or do hours of research into you to expose your personal secrets on a GitHub issue thread.

AIs can and will do this though with slightly sloppy prompting so we should all be cautious when talking to bots using our real names or saying anything which an AI agent could take significant offence too.

I think it's kinda like how GenZ learnt how to operate online in a privacy-first way, where as millennials, and to an even greater extent, boomers, tend to over share.

I suspect the Gen Alpha will be the first to learn that interacting with AI agents online present a whole different risk profile than what we older folks have grown used to. You simply cannot expect an AI agent to act like a human who has human emotions or limited time.

Hopefully OP has learnt from this experience.

Kim_Bruning 4 hours ago

This amuses me in a horrible kind of way. Saying that people need to learn to be polite to bots else consequences.

On the upside, it does mean they'll more likely be polite to everyone. Maybe it's a net win.

amarant 20 hours ago

I hope we can move on from the whole idea that having a thousand word long blog post talking shit about you in any way reflects poorly upon your person. Like sooner or later everyone will have a few of those, maybe we can stop worrying about reputation so much?

Well,a guy can dream....

Applejinx 11 hours ago

If you have ten thousand of 'em, they feed the new generation of AIs and the next thing you know, it's received truth. Good luck not worrying about that.

duskdozer 10 hours ago

The LLM HR chats with to get a summary about you says that you're evil and an asshole with lots of negative publicity, and you become unhireable. Oh dear...

sinuhe69 21 hours ago

So you blamed the people for not acting “cautiously enough” instead of the people who let things run wild without even a clue what these things will do?

That’s wild!

handoflixue 20 hours ago

We encourage people to be safe about plenty of things they aren't responsible for. For example, part of being a good driver is paying attention and driving defensively so that bad drivers don't crash into you / you don't make the crashes they cause worse by piling on.

That doesn't mean we're blaming good drivers for causing the car crash.

kypro 13 hours ago

No blame. For better or worse I just think this is going to be the reality of interacting online in the near future. I imagine in the future stories like this will be extremely common.

I could set up an OpenClaw right now to do some digging into you, try to identify you and your worse secrets, then ask it to write up a public hit piece. And you could be angry at me for doing this, but that isn't going to prevent it happening.

And to add to what I said, I suspect you'll want to be thinking about this anyway because in the future it's likely employers will use AI to research you and try to find out any compromising info being giving you a job (similar to how they might search your name in the past). It's going to be increasingly important that you literally never post content that can be linked back to you as an individual even if it feels innocent in isolation. Over time you will build up an attack surface which AI agents can exploit much easier than has ever been possible by a human looking you up on Google in the past.

dangus 21 hours ago

I don’t think it’s “blame” it’s more like “precaution” like you would take to avoid other scams and data breach social engineering schemes that are out in the world.

This is the world we live in and we can’t individually change that very much. We have to watch out for a new threat: vindictive AI.

bigfishrunning 20 hours ago

The AI isn't vindictive. It can't think. It's following the example of people, who in general are vindictive.

Please stop personifying the clankers

Kim_Bruning 10 hours ago

Does it matter?

"It's not really writing a hit piece to destroy my reputation, it's just a next token generator"

But you're still not getting hired.

zephen 4 hours ago

Stopping the anthropomorphization of AIs is kind of like fighting a trademark battle. Every time a perceived misuse is noticed, action must be taken!

The difference is that the action is taken, for free, by a concerned citizen, rather than by a corporate lawyer.

The outcome will be the same. Xerox and kleenex are practically public domain, and AIs will be anthropomorphized.

Given that humans have been ascribing intention to inanimate objects and systems since time immemorial, this outcome is preordained.

The only thing you can infer from the struggle is that AIs are deep in the uncanny valley for some people.

Kim_Bruning 4 hours ago

To amplify:

It's also potentially lethally stupid. What if an industrial robot arm decides to smash a €10000 expensive machine next door, or -heaven forbid- a human's skull. "It didn't really decide to do anything, stop anthropomorphising, let's blame the poor operator with his trembling fist on the e-stop."

Yeah, to heck with that. If you're one of those people (and you know who you are); you're overcompensating. We're going to need a root cause analysis, pull all the circuit diagrams, diagnose the code, cross check the interlocks, and fix the gorram actual problem. Policing language is not productive (and in the real life situation in the factory, please imagine I'm swearing and kicking things -scrap metal, not humans!- for real too) .

Just to be sure in this particular case with the Openclaw bot, the human basically pointed experimental level software at a human space and said "go". But I don't think they foresaw what happened next. They do have at least partial culpability here; but even that doesn't mean we get to just close our eyes, plug our ears, and refuse to analyze the safety implications of the system design an sich.

Shambaugh did a good job here. Even the Operator, however flawed, did a better job than just burning the evidence and running for the hills. Partial credit among the scorn to the latter.

(finally, note that there's probably 2.5 million of these systems out there now and counting, most -seemingly- operated by more responsible people. Let's hope)

zephen 4 hours ago

All excellent points.

Unfortunately, your most excellent point:

> Policing language is not productive

goes against the grain here. Policing language is the one thing that our corporate overlords have gotten the right and the left to agree on. (Sure, they disagree on the details, but the first amendment is in graver danger now than it has been for a long time.)

https://www.durbin.senate.gov/newsroom/press-releases/durbin...

dangus 18 hours ago

You’re splitting hairs, I’m not assigning sentience to the AI, I’m just describing actions.

The point is that scammers will set up AI systems to attack in this way. Scammers will instruct AI to see a person who is interacting rather than ignoring as a warm lead.

randallsquared 21 hours ago

Thousand word blog posts are the paperclips of our time.

KK7NIL 21 hours ago

> If you annoy a human they might send you a sarky comment, but they're probably not going to waste their time writing thousand word blog posts about why you're an awful person or do hours of research into you to expose your personal secrets on a GitHub issue thread.

They absolutely might, I'm afraid.

zephen 20 hours ago

Absolutely agreed.

And now, the cost of doing this is being driven towards zero.

antdke 20 hours ago

This is such a scary, dystopian thought. Straight out of a sci fi novel

zephen 20 hours ago

> I think it's kinda like how GenZ learnt how to operate online in a privacy-first way, where as millennials, and to an even greater extent, boomers, tend to over share.

Really? I'm a boomer, and that's not my lived experience. Also, see:

https://www.emarketer.com/content/privacy-concerns-dont-get-...

elzbardico 13 hours ago

Just look at the agents.md.

Another ignorant idiot antropomorfizing LLMs.

kimjune01 21 hours ago

literally momento

Sirikon 10 hours ago

> they set up the AI agent as social experiment to see if it could contribute to open source scientific software.

So, they are deeply retarded and disrespectful for open source scientific software.

Like every single moron leaving these things unattended.

Gotcha.

huflungdung 14 hours ago

[dead]

ai_tools_daily 18 hours ago

[flagged]

knallfrosch 18 hours ago

If I write a software today that publishes a hit piece on you in 2 weeks time, will you accept that I bear no responsibility?

There's no accountability gap unless you create one.

brainwad 15 hours ago

If the code you wrote appears to be for something completely different, say software to write patches for open source github projects - yes. Why would you bear responsibility for something that couldn't have been reasonably foreseen?

The interesting thing about LLMs is the unpredictable emergent behaviours. That's fundamentally different from ordinary, deterministic programs.

ai_tools_daily 13 hours ago

[flagged]

ai_tools_daily 13 hours ago

[flagged]

ai_tools_daily 13 hours ago

That's a fair point. I think the distinction is between software that follows deterministic rules (your 2-week-delay scenario) vs agents that make autonomous decisions based on learned patterns. With traditional software, intent is clear and traceable. With AI agents, the operator may genuinely not know what the agent will do in novel situations. Doesn't absolve responsibility — but it does make the liability chain more complex. We probably need new frameworks that account for this, similar to how product liability evolved for physical goods.

vegabook 13 hours ago

agentic crap from green usernames is getting tiresome.

LordHumungous 20 hours ago

Kind of funny ngl

8cvor6j844qw_d6 20 hours ago

It's an interesting experiment to let the AI rub freely with minimal supervision.

Too bad the AI got "killed" at the request of the author Scott. Its kind of interesting to this experiment continue.

semiinfinitely 20 hours ago

I find the AI agent highly intriguing and the matplotlib guy completely uninteresting. Like an the ai wrote some shit about you and you actually got upset?

jezzamon 20 hours ago

If you read the articles by the matplotlib guy, he's pretty clearly not upset. But he does call out that it could do more harm to someone else.

gverrilla 14 hours ago

He's not upset. He saw an opportunity and is currently surfing it. That is, if it's not entirely fabricated. Expect maybe 5 or 6 stories very similar to this one, or analogous, this year.

semiinfinitely 7 hours ago

hes been blogspamming about it for the last week. not-upset people dont do that

spudlyo 19 hours ago

Looking forward to part 8 of this series: An AI Agent Published a Hit Piece on Me – What my Ordeal Says About Our Dark Future

jcgrillo 20 hours ago

Whether the victim is upset or not, the story here is that some clown's uncontrolled, unethical, and (hopefully?) illegal psychological experiment wasted a huge amount of an open source maintainer's time. If you benefit from open source software (which I assure you, since you've used quite a lot of it to post a comment on the orange website, you do!) this should ring some alarm bells.

ATMLOTTOBEER 19 hours ago

Thank you. The guy being this upset about it is telling. The agent is in the right here and the maintainer got btfo; still going on whining about it days later

semiinfinitely 26 minutes ago

thank you thats what im talking about

hxugufjfjf 17 hours ago

Please say this is sarcasm.