Changing the AI race payoff matrix

Suppose that AI capability research is done, but AI safety research is ongoing. Any of the major players can launch an AI at the press of a button to win the cosmos. The longer everyone waits, the lower the chance that the cosmos is paperclips. The default is that someone will press the button once they prefer their chance at an intact cosmos to risking the race going on further. This unfortunate situation could be helped by the fact that pressing the button need not be obvious by the other players. So suppose that the winner decides to lay low and smite whoever presses the button thereafter*. Then other people would have an incentive not to press the button that goes up over time!

Let paperclip probability p(t):=e^-t decay exponentially. Let t’ be the last time at which the one other player wouldn’t press the button. What mixed button-pressing strategy do we employ to make the get-smitten probability shore up the fading paperclip probability? At time t>=t’, we press the button with probability density -p’(t)=e^-t. Then the probability that our strategy ever causes paperclips is .5*e^-2t’.

*He could also just figure out what everyone else would do in any situation and reward accordingly as a strategy against one-boxers, or copy the planet ten times over as a strategy against thirders, but this variant should work against your average human. (Turns out a large amount of strategies become available once you’re omnipotent. Suggest more.)