Teaching Claude Why
43 points by pretext 5 hours ago | 5 comments
soletta 59 minutes ago
This reinforces my suspicion that alignment and training in general is closer to being a pedagogical problem than anything else. Given a finite amount of training input, how do we elicit the desired model behavior? I’m not sure if asking educators is the right answer, but it’s one place to start.
replyplastic-enjoyer 22 minutes ago
inb4 there will be a whole new field of research that is basically psychology / pedagogy for AI. Who will be the Sigmund Freud of AI?
reply
For anyone who isn't keeping up there is also work being done [0] to understand how models model ethical considerations internally. Mainly, one suspects, to make the open models less ethical on demand rather than to support alignment. Turns out that models tend to learn some sort of "how moral is this?" axis internally when refusing queries that can be identified and interfered with.
[0] https://github.com/p-e-w/heretic