TBH, I am struggling with the idea that an AI intent on maximising a thing doesn’t have that thing as a goal. Whether or not the goal was intended seems irrelevant to whether or not the goal exists in the thought experiment.
“Goal stability is almost certainly attained in some sense given sufficient competence”
I am really not sure about this, actually. Flexible goals is a universal feature of successful thinking organisms. I would expect that natural selection would kick in at least over sufficient scales (light delay making co-ordination progressively harder on galactic scales), causing drift. But even on small scales, if an AI has, say, 1000 competing goals, I would find it surprising if in a practical sense goals were actually totally fixed, even if you were superintelligent. Any number of things could change over time, such that locking yourself into fixed goals could be seen as a long-term risk to optimisation for any goal.
“Alignment is not just absence of value drift, it’s also setting the right target, which is a very confused endeavor because there is currently no legible way of saying what that should be for humanity”—totally agree with that!
“AIs themselves might realize that (even more robustly than humans do), ending up leaning in favor of slowing down AI progress until they know what to do about that”—god I hope so haha
Thanks for this!
TBH, I am struggling with the idea that an AI intent on maximising a thing doesn’t have that thing as a goal. Whether or not the goal was intended seems irrelevant to whether or not the goal exists in the thought experiment.
“Goal stability is almost certainly attained in some sense given sufficient competence”
I am really not sure about this, actually. Flexible goals is a universal feature of successful thinking organisms. I would expect that natural selection would kick in at least over sufficient scales (light delay making co-ordination progressively harder on galactic scales), causing drift. But even on small scales, if an AI has, say, 1000 competing goals, I would find it surprising if in a practical sense goals were actually totally fixed, even if you were superintelligent. Any number of things could change over time, such that locking yourself into fixed goals could be seen as a long-term risk to optimisation for any goal.
“Alignment is not just absence of value drift, it’s also setting the right target, which is a very confused endeavor because there is currently no legible way of saying what that should be for humanity”—totally agree with that!
“AIs themselves might realize that (even more robustly than humans do), ending up leaning in favor of slowing down AI progress until they know what to do about that”—god I hope so haha