I have another “objection”, although it’s not a very strong one, and more of just a comment.
One reason game theory reasoning doesn’t work very well in predicting human behavior is because games are always embedded in a larger context, and this tends to wreck the game-theory analysis by bringing in reputation and collusion as major factors. This seems like something that would be true for AIs as well (e.g. “the code” might not tell the whole story; I/”the AI” can throw away my steering wheel but rely on an external steering-wheel-replacing buddy to jump in at the last minute if needed).
In apparent contrast to much of the rationalist community, I think by default one should probably view game theoretic analyses (and most models) as “just one more way of understanding the world” as opposed to “fundamental normative principles”, and expect advanced AI systems to reason more heuristically (like humans).
But I understand and agree with the framing here as “this isn’t definitely a problem, but it seems important enough to worry about”.