There’s also something weird being assumed, about it making sense to define utility functions that only care about some counterfactual worlds. (I mean this is a reasonable assumption that people make, but it seems weird in general.) Like, this seems in tension with acausal bargaining / threats. If V_f wants V, doesn’t it want what V says is good, and V might have opinions about other worlds (for example: “there shouldn’t be torture, anywhere, even in counterfactual worlds”), and so optimizing for V_f optimizes even worlds where not-f?
If V has counterfactuals that cancel out the f in Vf, then I could see the results getting pretty funky, yes. But I’m imagining that V limits itself to counterfactuals that don’t cancel out the f.
There’s also something weird being assumed, about it making sense to define utility functions that only care about some counterfactual worlds. (I mean this is a reasonable assumption that people make, but it seems weird in general.) Like, this seems in tension with acausal bargaining / threats. If V_f wants V, doesn’t it want what V says is good, and V might have opinions about other worlds (for example: “there shouldn’t be torture, anywhere, even in counterfactual worlds”), and so optimizing for V_f optimizes even worlds where not-f?
If V has counterfactuals that cancel out the f in Vf, then I could see the results getting pretty funky, yes. But I’m imagining that V limits itself to counterfactuals that don’t cancel out the f.