This seems more like transparent Newcomb’s problem with a chance to precommit, than counterfactual mugging.
Counterfactual mugging is isomorphic to transparent-boxes Newcomb’s problem.
Also, this doesn’t involve a chance to precommit, but an option to increase the chance that a similarly-situated being will be forced to adhere to a precommitment.
Just to correct some side-points you touched on: paperclips maximizers are robust against the wireheading failure mode because they recognize that forcing one’s sensors to deviate from the true world state introduces a corresponding discount in the value of making its reading reach a desired level.
Certainly, one could theoretically hijack a clippy’s sensors into giving them bad information about the rate of paperclip production, but this is different from saying that a clippy would someone decide to maximize (in violation of its causal diagram heuristics) the imperfect value of an approximator when it is knowably in a dangerously wrong setting.