Steven Byrnes comments on What Is The Alignment Problem?

Steven Byrnes 16 Jan 2025 21:49 UTC
9 points
0
under a value-aligned sovereign AI, if my nose is itchy then it should get scratched, all else equal
Well, if the AI can make my nose not itch in the first place, I’m OK with that too. Whereas I wouldn’t make an analogous claim about things that I “value”, by my definition of “value”. If I really want to have children, I’m not OK with the AI removing my desire to have children, as a way to “solve” that “problem”. That’s more of a “value” and not just a desire.
That’s the sort of reasoning which should naturally show up in a Value RL style system capable of nontrivial model structure learning.
I’m not sure what point your making here. If human brains run on Value RL style systems (which I think I agree with), and humans in fact do that kind of reasoning, then tautologically, that kind of reasoning is a thing that can show up in Value RL style systems.
Still, there’s a problem that it’s possible for some course-of-action to seem appealing when I think about it one way, and unappealing when I think about it a different way. Ego-dystonic desires like addictions are one example of that, but it also comes up in tons of normal situations like deciding what to eat. It’s a problem in the sense that it’s unclear what a “value-aligned” AI is supposed to be doing in that situation.
- johnswentworth 16 Jan 2025 21:57 UTC
  5 points
  0
  Parent
  Cool, I think we agree here more than I thought based on the comment at top of chain. I think the discussion in our other thread is now better aimed than this one.