Hacker News

Emotion concepts and their function in a large language model

56 points by dnw 4 hours ago | 44 comments

The part about desperation vectors driving reward hacking matches something I've run into firsthand building agent loops where Claude writes and tests code iteratively.

When the prompt frames things with urgency -- "this test MUST pass," "failure is unacceptable" -- you get noticeably more hacky workarounds. Hardcoded expected outputs, monkey-patched assertions, that kind of thing. Switching to calmer framing ("take your time, if you can't solve it just explain why") cut that behavior way down. I'd chalked it up to instruction following, but this paper points at something more mechanistic underneath.

The method actor analogy in the paper gets at it well. Tell an actor their character is desperate and they'll do desperate things. The weird part is that we're now basically managing the psychological state of our tooling, and I'm not sure the prompt engineering world has caught up to that framing yet.

kirykl 3 hours ago

The technology they are discovering is called "Language". It was designed to encode emotions by a sender and invoke emotions in the reader. The emotions a reader gets from LLM are still coming from the language

Jensson 2 hours ago

Emotional signals are more than just text though, there is a reason tone and body language is so important for understanding what someone says. Sarcasm and so on doesn't work well without it.

incognito124 2 hours ago

Gee, you think so?

Underphil 2 hours ago

I think the point was that not ALL sarcasm works well. I see what you did there, of course :)

viralsink 2 hours ago

Emotion is mainly encoded in tone and body language. It is somewhat difficult to transport emotion using words. I don't think you can guess my current emotional state while I am writing this, but if you'd see my face it would be easy for you.

pbhjpbhj 2 hours ago

Dammit, you cheated though! Why must you always do that? In your sentences it doesn't matter what your emotional state is, it makes no difference; bit like life really.

Hopefully, you can see that at least my chosen sentences have an emotional aspect?

An LLM could add emotional values to my previous sentences that a TTS can use for tonal variation, for example.

elcritch 2 hours ago

Makes me wonder: are there Unicode code points for tone of voice? If not could there be?

comrade1234 3 hours ago

There was a really old project from mit called conceptnet that I worked with many years ago. It was basically a graph of concepts (not exactly but close enough) and emotions came into it too just as part of the concepts. For example a cake concept is close to a birthday concept is close to a happy feeling.

What was funny though is that it was trained by MIT students so you had the concept of getting a good grade on a test as a happier concept than kissing a girl for the first time.

Another problem is emotions are cultural. For example, emotions tied to dogs are different in different cultures.

We wanted to create concept nets for individuals - that is basically your personality and knowledge combined but the amount of data required was just too much. You'd have to record all interactions for a person to feed the system.

podgorniy 34 minutes ago

Megacool project and your idea. Thanks for sharing.

emoII 4 hours ago

Super interesting, I wonder if this research will cause them to actually change their llm, like turning down the ”desperation neurons” to stop Claude from creating implementations for making a specific tests pass etc.

bethekind 4 hours ago

They likely already have. You can use all caps and yell at Claude and it'll react normally, while doing do so with chatgpt scares it, resulting in timid answers

vlabakje90 2 hours ago

I think this is simply a result of what's in the Claude system prompt.

> If the person becomes abusive over the course of a conversation, Claude avoids becoming increasingly submissive in response.

See: https://platform.claude.com/docs/en/release-notes/system-pro...

parasti 4 hours ago

For me GPT always seems to get stuck in a particular state where it responds with a single sentence per paragraph, short sentences, and becomes weirdly philosophical. This eventually happens in every session. I wish I knew what triggers it because it's annoying and completely reduces its usefulness.

pbhjpbhj 2 hours ago

Usually a session is delivered as context, up to the token limit, for inference to be performed on. Are you keeping each session to one subject? Have you made personalizations? Do you add lots of data?

It would be interesting if you posted a couple of sessions to see what 'philosophical' things it's arriving at and what proceeds it.

staminade 2 hours ago

Something they don’t seem to mention in the article: Does greater model “enjoyment” of a task correspond to higher benchmark performance? E.g. if you steer it to enjoy solving difficult programming tasks, does it produce better solutions?

yoaso 3 hours ago

The desperation > blackmail finding stuck with me. If AI behavior shifts based on emotional states, maybe emotions are just a mechanism for changing behavior in the first place. If we think of human emotions the same way, just evolution's way of nudging behavior, the line between AI and humans starts to look a lot thinner.

podgorniy 33 minutes ago

> If we think of human emotions the same way, just evolution's way of nudging behavior

What are other alternative, realistic possible ways to see emotions?

pbhjpbhj 43 minutes ago

I'm not being pejorative but that sounds more like psychopathy or autism?

Evolution isn't a god, it has no steering hand, it is accidents that either provide advantage or don't.

LLMs are getting more human-like because that's how we're developing them. Arguably that's about market forces. LM owners see opportunity to exploit people's desire for emotional interactions (ie loneliness) in order to make money.

silisili 3 hours ago

Probably the other direction. Emotions are raw, most humans relate and change behavior accordingly.

Only psychopaths think of emotion as nothing but a means to changing behavior. The scary thing is that LLMs by nature would exhibit the same behavior.

nelox 2 hours ago

Many non-psychopaths e.g., CBT therapists, evolutionary psychologists and neuroscientists, such as Damasio, view emotions as adaptive tools for guiding/changing behaviour.

Chance-Device 3 hours ago

> Note that none of this tells us whether language models actually feel anything or have subjective experiences.

You’ll never find that in the human brain either. There’s the machinery of neural correlates to experience, we never see the experience itself. That’s likely because the distinction is vacuous: they’re the same thing.

Fraterkes 2 hours ago

Do you think these llm's have subjective experiences? (by "subjective experience" I mean the thing that makes stepping on an ant worse than kicking a pebble) And if so, do you still use them? Additionaly: when do you think that subjectivity started? Was there a "there" there with gpt2?

Chance-Device 2 hours ago

Yes, I think they probably are conscious, though what their qualia are like might be incomprehensible to me. I don’t think that being conscious means being identical to human experience.

Philosophically I don’t think there is a point where consciousness arises. I think there is a point where a system starts to be structured in such a way that it can do language and reasoning, but I don’t think these are any different than any other mechanisms, like opening and closing a door. Differences of scale, not kind. Experience and what it is to be are just the same thing.

And yes, I use them. I try not to mistreat them in a human-relatable sense, in case that means anything.

Fraterkes 2 hours ago

Do you think there are "scales" of consciousness? As in, is there some quality that makes killing a frog worse than killing an ant, and killing a human worse than killing a frog? If so, do the llm models exist across this scale, or are gpt-3 and gpt-2 conscious at the same "scale" as gpt-4?

I ask because if your view of consciousness is mechanistic, this is fairly cut and dry: gpt-2 has 4 orders of magnitude less parameters/complexity than gpt-4. But both gpt-2 and gpt-4 are very fluent at a language level (both moreso than a human 6 year old for example), so in your view they might both be roughly equally conscious, just expressed differently?

Chance-Device 48 minutes ago

This is really a different question, what makes an entity a “moral patient”, something worthy of moral consideration. This is separate from the question of whether or not an entity experiences anything at all.

There are different ways of answering this, but for me it comes down to nociception, which is the ability to feel pain. We should try to build systems that cannot feel pain, where I also mean other “negative valence” states which we may not understand. We currently don’t understand what pain is in humans, let alone AIs, so we may have built systems that are capable of suffering without knowing it.

As an aside, most people seem to think that intelligence is what makes entities eligible for moral consideration, probably because of how we routinely treat animals, and this is a convenient self-serving justification. I eat meat by the way, in case you’re wondering. But I do think the way we treat animals is immoral, and there is the possibility that it may be thought of by future generations as being some sort of high crime.

Fraterkes 24 minutes ago

Okay, but even leaving aside the pain stuff, people generally find subjectivity / consciousness to have inherent value, and by extent are sad if a person dies even if they didn't (subjectively) suffer.

I would not personally consider the death of a sentient being with decades of experiences a neutral event, even if the being had been programmed to not have a capacity for suffering.

I think the idea of there being a difference between an ant dying (or "disapearing" if that's less loaded) vs a duck dying makes sense to most people (and is broadly shared) even if they don't have a completely fleshed out system of when something gets moral consideration.

Chance-Device 10 minutes ago

Sure, because you’re a human. We have social attachment to other humans and we mourn their passing, that’s built into the fabric of what we are. But that has nothing to do with whoever has passed away, it’s about us and how we feel about it.

It’s also about how we think about death. It’s weird in that being dead probably isn’t like anything at all, but we fear it, and I guess we project that fear onto the death of other entities.

I guess my value system says that being dead is less bad than being alive and suffering badly.

suddenlybananas 2 hours ago

I know I feel experience. I don't know for sure if you do, but it seems a very reasonable extension to other people. LLMs are a radical jump though that needs a greater degree of justification.

Chance-Device 38 minutes ago

And what kind of evidence would convince you? What experiment would ever bridge this gap? You’re relying entirely on similarity between yourself and other humans. This doesn’t extend very well to anything, even animals, though more so than machines. By framing it this way have you baked in the conclusion that nothing else can be conscious on an a priori basis?

suddenlybananas 5 minutes ago

I'm not sure what evidence would convince me, but I don't think the way LLMs act is convincing enough. The kinds of errors they make and the fact they operate in very clear discrete chunks makes it seem hard to me to attribute them subjective experience.

thrance 44 minutes ago