I think the problem of this analogy is assuming that if your competitors pull in the Trojan horse first, the Trojan horse will destroy them and then leave you alone.
I think this assumption is simply false in the AI race. If your competitors are racing, if you don’t win the race, you just lose. If their AI turns out to be aligned, they win, and you lose (or if your goal is AI alignment and not power, you win anyway, but you still shouldn’t give up the race in hopes of this happening unless you think your competitors are better at alignment than you are). If their AI is misaligned, then it screws over the world and you lose too.
Maybe we can do more Vending Bench style benchmarks where the AI can keep doing better in a simulated world environment given some constraints?
Basically we put them in a video game with dynamic constraints enforced by an AI game master and score different AIs on their performance with the same game master AI. That way we can measure open-ended, long-term things like coherently running a simulated company.