If not “a slave to its utility function”, then what a superintelligence would be like? Constantly modifying its utility function?
I think a superintelligence would have almost arbitrary utility function that is very sensitive to initial conditions, and then it would slightly modify the utility function to a self-consistent one and keep it forever. It almost never makes sense to change your utility function to a new one according to your old utility function.
Goals defined for a person who is not already a formal agent are a living thing, a computational process built from possible behaviors and decisions of that person in various hypothetical situations. Such goals are not even conceptually prior to those behaviors, though there is still an advantage in formulating them as an unchanging computation that defines the target for external agency aiming in alignment with that person’s own aims. But that computation is never fully computed, and it can only be computed further through the decisions of the person who defines it as their goals.
I agree that a human doesnt have cleanly defined goals and I agree with most of the additional nuances in your comment to the extent that I can understand them, but OP is talking about superintelligence and I think modelling a superintelligence as having a constant-across-time utility function is appropriate.
An aligned superintelligence would work with goals of the same kind, even if it’s aligned to early AGIs rather than humans. Goals-as-computations may be constant, like the code of a program may be constant, but what’s known about its behavior isn’t constant. And so the way it guides actions of an agent develops as it gets computed further, ultimately according to decisions of the underlying humans/AGIs (and their future iterations) in various hypothetical situations. Also, an uplifted (grown up) human could be a superintelligence personally, it’s not a different kind of thing with respect to values it could have.
If not “a slave to its utility function”, then what a superintelligence would be like? Constantly modifying its utility function?
I think a superintelligence would have almost arbitrary utility function that is very sensitive to initial conditions, and then it would slightly modify the utility function to a self-consistent one and keep it forever. It almost never makes sense to change your utility function to a new one according to your old utility function.
Goals defined for a person who is not already a formal agent are a living thing, a computational process built from possible behaviors and decisions of that person in various hypothetical situations. Such goals are not even conceptually prior to those behaviors, though there is still an advantage in formulating them as an unchanging computation that defines the target for external agency aiming in alignment with that person’s own aims. But that computation is never fully computed, and it can only be computed further through the decisions of the person who defines it as their goals.
I agree that a human doesnt have cleanly defined goals and I agree with most of the additional nuances in your comment to the extent that I can understand them, but OP is talking about superintelligence and I think modelling a superintelligence as having a constant-across-time utility function is appropriate.
An aligned superintelligence would work with goals of the same kind, even if it’s aligned to early AGIs rather than humans. Goals-as-computations may be constant, like the code of a program may be constant, but what’s known about its behavior isn’t constant. And so the way it guides actions of an agent develops as it gets computed further, ultimately according to decisions of the underlying humans/AGIs (and their future iterations) in various hypothetical situations. Also, an uplifted (grown up) human could be a superintelligence personally, it’s not a different kind of thing with respect to values it could have.