I like the parent/child analogy. To apply it to the human/AI dynamic, we need to imagine that it’s mutually understood that the child will never grow up and that they’ll be served by the parent for the rest of time. Now, concretely think about what it means for a parent to be aligned with a child’s preferences. Does the parent arrange the world such that their child can get variations of their favorite candy and play video games all day? Or does the parent make the child study, so they get good grades compared to their peers and feel dignified? Or somewhere in between, based on how mad the child gets when deprived of the video game? The parent can constantly ask the child which angles they prefer, but the child can’t comprehend the deeper implications and even the framing of truths can get them to give predictably different answers.
The life that the child will live is entirely dependent on the parent’s preferences because affecting the world routes through the parent’s cognition. The child isn’t meaningfully “making a call” if they’re only making that specific call because their parent orchestrated the conditions for it, then presented a few options to them in bite sized pieces all the while knowing which one they’ll take (they can even load in the next candy before the kid asks for it).
The loss of agency I’m describing isn’t superficial. Another way to think about agency is in counterfactuals. I think there’s many possible benevolent ASIs that would cater to the child in drastically different ways such that the child would be in agreement and enthusiastic the whole time. Once we create a benevolent ASI, we’re entering a regime where our decisions are no longer the cause of changes in the world. Only things that the ASI prefers will happen, and it would steer us in that direction with full understanding. I think your argument is essentially “but if it thinks our preferences are really important we’re still in control in some sense”, I’m saying “if it’s a lot smarter than us it will have to make many subtle large and small decisions, and our preferences will be one small piece of a large machine. Our desires won’t be coherent at that scale and we won’t be able to make sense of what’s happening to engage with it.”
I like the parent/child analogy. To apply it to the human/AI dynamic, we need to imagine that it’s mutually understood that the child will never grow up and that they’ll be served by the parent for the rest of time. Now, concretely think about what it means for a parent to be aligned with a child’s preferences. Does the parent arrange the world such that their child can get variations of their favorite candy and play video games all day? Or does the parent make the child study, so they get good grades compared to their peers and feel dignified? Or somewhere in between, based on how mad the child gets when deprived of the video game? The parent can constantly ask the child which angles they prefer, but the child can’t comprehend the deeper implications and even the framing of truths can get them to give predictably different answers.
The life that the child will live is entirely dependent on the parent’s preferences because affecting the world routes through the parent’s cognition. The child isn’t meaningfully “making a call” if they’re only making that specific call because their parent orchestrated the conditions for it, then presented a few options to them in bite sized pieces all the while knowing which one they’ll take (they can even load in the next candy before the kid asks for it).
The loss of agency I’m describing isn’t superficial. Another way to think about agency is in counterfactuals. I think there’s many possible benevolent ASIs that would cater to the child in drastically different ways such that the child would be in agreement and enthusiastic the whole time. Once we create a benevolent ASI, we’re entering a regime where our decisions are no longer the cause of changes in the world. Only things that the ASI prefers will happen, and it would steer us in that direction with full understanding. I think your argument is essentially “but if it thinks our preferences are really important we’re still in control in some sense”, I’m saying “if it’s a lot smarter than us it will have to make many subtle large and small decisions, and our preferences will be one small piece of a large machine. Our desires won’t be coherent at that scale and we won’t be able to make sense of what’s happening to engage with it.”