Yeah, that’s part of it. Also maybe that it can have preferences about “state of the world right now and in the immediate future”, and not just “state of the world in the distant future”.
For example, if an ASI wants me to ultimately wind up with power, that’s a preference about the distant future, so its best bet might be to forcibly imprison me somewhere safe, gather maximum power for itself, and hand that power to me later on. Whereas if an ASI wants me to retain power continuously, then presumably the ASI would be corrigible. But “me retaining power” is something about the state of the world, not directly about the ASI’s strategies and plans, IMO.
(Also, “expect” is not quite right, I was just saying that I don’t find a certain argument convincing, not that this definitely isn’t a problem. I’m still pretty unsure. And until I have a concrete plan that I expect to work, I am very open-minded to the possibility that finding such a plan is (even) harder than I realize.)
Yeah, that’s part of it. Also maybe that it can have preferences about “state of the world right now and in the immediate future”, and not just “state of the world in the distant future”.
For example, if an ASI wants me to ultimately wind up with power, that’s a preference about the distant future, so its best bet might be to forcibly imprison me somewhere safe, gather maximum power for itself, and hand that power to me later on. Whereas if an ASI wants me to retain power continuously, then presumably the ASI would be corrigible. But “me retaining power” is something about the state of the world, not directly about the ASI’s strategies and plans, IMO.
(Also, “expect” is not quite right, I was just saying that I don’t find a certain argument convincing, not that this definitely isn’t a problem. I’m still pretty unsure. And until I have a concrete plan that I expect to work, I am very open-minded to the possibility that finding such a plan is (even) harder than I realize.)