Hacker News

43 points by pretext 5 hours ago | 5 comments

roenxi 7 minutes ago

One of the lessons of philosophy is that once you adopt any particular value system, almost all philosophers either become immoral or caught up in meaningless and trivial quibbles. This sort of alignment work is quite interesting because it looks like we might be about to re-tread the history of philosophy at a speedrun pace in the AI world. It'll be interesting to watch.

For anyone who isn't keeping up there is also work being done [0] to understand how models model ethical considerations internally. Mainly, one suspects, to make the open models less ethical on demand rather than to support alignment. Turns out that models tend to learn some sort of "how moral is this?" axis internally when refusing queries that can be identified and interfered with.

[0] https://github.com/p-e-w/heretic

soletta 59 minutes ago

This reinforces my suspicion that alignment and training in general is closer to being a pedagogical problem than anything else. Given a finite amount of training input, how do we elicit the desired model behavior? I’m not sure if asking educators is the right answer, but it’s one place to start.

plastic-enjoyer 22 minutes ago

inb4 there will be a whole new field of research that is basically psychology / pedagogy for AI. Who will be the Sigmund Freud of AI?

cyanydeez 18 minutes ago

you mean completely wrong, spread a problematic understanding psychology, and delay real progress for decades because smart people spend fruitless years trying to find a use for it.

...I think we might already have those people running AI companies.

bicx 16 minutes ago

Side note: Anthropic has done well at achieving an immediately-recognizable art style.

pkuschnirof 53 minutes ago

[flagged]

kdkdkslsouxns 46 minutes ago

[dead]