Well, as long as SA is wired to “get out of the way if A starts moving”, then the optimal R-maximising policy is always to move towards the red button; anything else is clearly not R-maximising (note that SA doesn’t need to “know” anything; just be programmed to have a different policy depending on how A moves, with A itself setting this up to signal whether it’s R-maximising or not).
But in any case, that specific problem can be overcome with the right rollouts.
Well, as long as SA is wired to “get out of the way if A starts moving”, then the optimal R-maximising policy is always to move towards the red button; anything else is clearly not R-maximising (note that SA doesn’t need to “know” anything; just be programmed to have a different policy depending on how A moves, with A itself setting this up to signal whether it’s R-maximising or not).
But in any case, that specific problem can be overcome with the right rollouts.