Martín Soto comments on The Commitment Races problem

Martín Soto 15 Feb 2024 18:58 UTC
6 points
0
I agree most superintelligences won’t do something which is simply “play the ordinal game” (it was just an illustrative example), and that a superintelligence can implement your proposal, and that it is conceivable most superintelligences implement something close enough to your proposal that they reach Pareto-optimality. What I’m missing is why that is likely.
Indeed, the normative intuition you are expressing (that your policy shouldn’t in any case incentivize the opponent to be more sophisticated, etc.) is already a notion of fairness (although in the first meta-level, rather than object-level). And why should we expect most superintelligences to share it, given the dependence on early beliefs and other pro tanto normative intuitions (different from ex ante optimization)? Why should we expect this to be selected for? (Either inside a mind, or by external survival mechanisms)
Compare, especially, to a nascent superintelligence who believes most others might be simulating it and best-responding (thus wants to be stubborn). Why should we think this is unlikely?
Probably if I became convinced trapped priors are not a problem I would put much more probability on superintelligences eventually coordinating.
Another way to put it is: “Sucks to be them!” Yes sure, but also sucks to be me who lost the $1! And maybe sucks to be me who didn’t do something super hawkish and got a couple other players to best-respond! While it is true these normative intuitions pull on me less than the one you express, why should I expect this to be the case for most superintelligences?