This counterfactual AI is motivated to take nice actions in worlds where the president died. It might not even know what “nice” means in other worlds.
And even if it knew the correct answer to that question, how can you be sure it wouldn’t instead lie to you in order to achieve its real goals? You can’t really trust the AI if you are not sure it is nice or at least indifferent...
This counterfactual AI is motivated to take nice actions in worlds where the president died. It might not even know what “nice” means in other worlds.
And even if it knew the correct answer to that question, how can you be sure it wouldn’t instead lie to you in order to achieve its real goals? You can’t really trust the AI if you are not sure it is nice or at least indifferent...