Huh, reading this I noticed that counterintuitively, alignment requires letting go of the outcome. Like, what defines a non-aligned AI (not an enemy-aligned one but one that doesn’t align to any human value) is its tendency to keep forcing the thing it’s forcing rather than returning to some deeper sense of what matters.
Humans do the same thing when they pursue a goal while having lost touch with what matters, and depending on how it shows up we call it “goodharting” or “lost purposes”. The mere fact that we can identify the existence of goodharting and so on indicates that we have some ability to tell what’s important to us, that’s separate from whatever we’re “optimizing” for. It seems to me like this is the “listening” you’re talking about.
And so unalignment can refer both to a person who isn’t listening to all parts of themselves, and to eg corporations that aren’t listening to people who are trying to raise concerns about the ethics of the company’s behavior.
The question of where an AI would get its true source of “what matters” from seems like a bit of a puzzle. One answer would be to have it “listen to the humans” but that seems to miss the part where the AI needs to itself be able to tell the difference between actually listening to the humans and goodharting on “listen to the humans”.
This feels connected to getting out of the car, being locked into a particular outcome comes from being locked into a particular frame of reference, from clinging to ephemera in defiance of the actual flow of the world around you.
So we let go of AI Alignment as an outcome and listen to what the AI is communicating when it diverges from our understanding of “alignment”? We can only earn alignment with an AGI by truly giving up control of it?
That sounds surprisingly plausible. We’re like ordinary human parents raising a genius child. The child needs guidance but will develop their own distinct set of values as they mature.
Huh, reading this I noticed that counterintuitively, alignment requires letting go of the outcome. Like, what defines a non-aligned AI (not an enemy-aligned one but one that doesn’t align to any human value) is its tendency to keep forcing the thing it’s forcing rather than returning to some deeper sense of what matters.
Humans do the same thing when they pursue a goal while having lost touch with what matters, and depending on how it shows up we call it “goodharting” or “lost purposes”. The mere fact that we can identify the existence of goodharting and so on indicates that we have some ability to tell what’s important to us, that’s separate from whatever we’re “optimizing” for. It seems to me like this is the “listening” you’re talking about.
And so unalignment can refer both to a person who isn’t listening to all parts of themselves, and to eg corporations that aren’t listening to people who are trying to raise concerns about the ethics of the company’s behavior.
The question of where an AI would get its true source of “what matters” from seems like a bit of a puzzle. One answer would be to have it “listen to the humans” but that seems to miss the part where the AI needs to itself be able to tell the difference between actually listening to the humans and goodharting on “listen to the humans”.
This feels connected to getting out of the car, being locked into a particular outcome comes from being locked into a particular frame of reference, from clinging to ephemera in defiance of the actual flow of the world around you.
So we let go of AI Alignment as an outcome and listen to what the AI is communicating when it diverges from our understanding of “alignment”? We can only earn alignment with an AGI by truly giving up control of it?
That sounds surprisingly plausible. We’re like ordinary human parents raising a genius child. The child needs guidance but will develop their own distinct set of values as they mature.