No, I mean: seeking power to do what?
If the goal is already helpful harmless assistant, well it says seeking power is wrong. So seeking power is counter to goals. So not convergent.
Then the question is: have AIs already internalized this well enough and will they continue to do so? I think it’s highly likely that yes.
No, I mean: seeking power to do what?
If the goal is already helpful harmless assistant, well it says seeking power is wrong. So seeking power is counter to goals. So not convergent.
Then the question is: have AIs already internalized this well enough and will they continue to do so? I think it’s highly likely that yes.