Teaching Claude Why
43 points by pretext 5 hours ago | 5 comments

roenxi 7 minutes ago
One of the lessons of philosophy is that once you adopt any particular value system, almost all philosophers either become immoral or caught up in meaningless and trivial quibbles. This sort of alignment work is quite interesting because it looks like we might be about to re-tread the history of philosophy at a speedrun pace in the AI world. It'll be interesting to watch.

For anyone who isn't keeping up there is also work being done [0] to understand how models model ethical considerations internally. Mainly, one suspects, to make the open models less ethical on demand rather than to support alignment. Turns out that models tend to learn some sort of "how moral is this?" axis internally when refusing queries that can be identified and interfered with.

[0] https://github.com/p-e-w/heretic

reply
soletta 59 minutes ago
This reinforces my suspicion that alignment and training in general is closer to being a pedagogical problem than anything else. Given a finite amount of training input, how do we elicit the desired model behavior? I’m not sure if asking educators is the right answer, but it’s one place to start.
reply
plastic-enjoyer 22 minutes ago
inb4 there will be a whole new field of research that is basically psychology / pedagogy for AI. Who will be the Sigmund Freud of AI?
reply
cyanydeez 18 minutes ago
you mean completely wrong, spread a problematic understanding psychology, and delay real progress for decades because smart people spend fruitless years trying to find a use for it.

...I think we might already have those people running AI companies.

reply
bicx 16 minutes ago
Side note: Anthropic has done well at achieving an immediately-recognizable art style.
reply
pkuschnirof 53 minutes ago
[flagged]
reply
kdkdkslsouxns 46 minutes ago
[dead]
reply