Even in environments where the agent is “alone”, we may still expect the agent to have the following potential convergent instrumental values
Right. But I think I sometimes bump into reasoning that feels like “instrumental convergence, smart AI, & humans exist in the universe → bad things happen to us / the AI finds a way to hurt us”; I think this is usually true, but not necessarily true, and so this extreme example illustrates how the implication can fail. (And note that the AGI could still hurt us in a sense, by simulating and torturing humans using its compute. And some decision theories do seem to have it do that kind of thing.)
Right. But I think I sometimes bump into reasoning that feels like “instrumental convergence, smart AI, & humans exist in the universe → bad things happen to us / the AI finds a way to hurt us”; I think this is usually true, but not necessarily true, and so this extreme example illustrates how the implication can fail. (And note that the AGI could still hurt us in a sense, by simulating and torturing humans using its compute. And some decision theories do seem to have it do that kind of thing.)
(Edited post to clarify)