As Nate pointed out, using these counterfactuals to make decisions gives you CDT. I posted an implementation of CDT using this idea a while ago. Here is a more general idea along these lines. It’s also used here and here.
In general, conditioning on extremely low probability events can lead to strange beliefs (e.g. skeptical hypotheses), which places lower bounds on the safe probability of X=0. If you try to give the AI a really confident model of the world, to eliminate skeptical hypotheses, then stuff can get weird fast.
More seriously, the behavior of these agents is very sensitive to the behavior of humans in extremely improbable cases. For example, you proposed having a button that operates the explosives with probability 99.9999%, and an AI who erroneously believes that the explosives will never go off (or who just makes decisions based on what will happen if the explosives never go off—since the two systems have formally identical behavior, it doesn’t seem worth distinguishing). But what do the humans actually do, when the button that is supposed to work 99.999999% of the time doesn’t? Without careful measures, they probably panic and turn off the machine anyway.
In general, it seems better to avoid conditioning on extremely unlikely events for both of these reasons. But conditioning on reasonably likely counterfactuals is a natural thing to do. And for some use cases, this may not be a problem.
As Nate pointed out, using these counterfactuals to make decisions gives you CDT. I posted an implementation of CDT using this idea a while ago. Here is a more general idea along these lines. It’s also used here and here.
In general, conditioning on extremely low probability events can lead to strange beliefs (e.g. skeptical hypotheses), which places lower bounds on the safe probability of X=0. If you try to give the AI a really confident model of the world, to eliminate skeptical hypotheses, then stuff can get weird fast.
More seriously, the behavior of these agents is very sensitive to the behavior of humans in extremely improbable cases. For example, you proposed having a button that operates the explosives with probability 99.9999%, and an AI who erroneously believes that the explosives will never go off (or who just makes decisions based on what will happen if the explosives never go off—since the two systems have formally identical behavior, it doesn’t seem worth distinguishing). But what do the humans actually do, when the button that is supposed to work 99.999999% of the time doesn’t? Without careful measures, they probably panic and turn off the machine anyway.
In general, it seems better to avoid conditioning on extremely unlikely events for both of these reasons. But conditioning on reasonably likely counterfactuals is a natural thing to do. And for some use cases, this may not be a problem.