The normative pull of your proposed procedure seems to come from a preconception that “the other player will probably best-respond to me” (and thus, my procedure is correctly shaping its incentives).
But instead we can consider the other player trying to get us to best-respond to them, by jumping up a meta-level: the player checks whether I am playing your proposed policy with a certain notion of fairness $X (which in your case is $5), and punishes accordingly to how far their notion of fairness $Y is from my $X, so that I (if I were to best-respond to his policy) would be incentivized to adopt notion of fairness $Y.
It seems clear that, for the exact same reason your argument might have some normative pull, this other argument has some normative pull in the opposite direction. It then becomes unclear which has stronger normative pull: trying to shape the incentives of the other (because you think they might play a policy one level of sophistication below yours), or trying to best-respond to the other (because you think they might play a policy one level of sophistication above yours).
I think this is exactly the deep problem, the fundamental trade-off, that agents face in both empirical and logical bargaining. I am not convinced all superintelligences will resolve this trade-off in similar enough ways to allow for Pareto-optimality (instead of falling for trapped priors i.e. commitment races), due to the resolution’s dependence on the superintelligences’ early prior.
I am denying that superintelligences play this game in a way that looks like “Pick an ordinal to be your level of sophistication, and whoever picks the higher ordinal gets $9.” I expect sufficiently smart agents to play this game in a way that doesn’t incentivize attempts by the opponent to be more sophisticated than you, nor will you find yourself incentivized to try to exploit an opponent by being more sophisticated than them, provided that both parties have the minimum level of sophistication to be that smart.
If faced with an opponent stupid enough to play the ordinal game, of course, you just refuse all offers less than $9, and they find that there’s no ordinal level of sophistication they can pick which makes you behave otherwise. Sucks to be them!
I agree most superintelligences won’t do something which is simply “play the ordinal game” (it was just an illustrative example), and that a superintelligence can implement your proposal, and that it is conceivable most superintelligences implement something close enough to your proposal that they reach Pareto-optimality. What I’m missing is why that is likely.
Indeed, the normative intuition you are expressing (that your policy shouldn’t in any case incentivize the opponent to be more sophisticated, etc.) is already a notion of fairness (although in the first meta-level, rather than object-level). And why should we expect most superintelligences to share it, given the dependence on early beliefs and other pro tanto normative intuitions (different from ex ante optimization)? Why should we expect this to be selected for? (Either inside a mind, or by external survival mechanisms) Compare, especially, to a nascent superintelligence who believes most others might be simulating it and best-responding (thus wants to be stubborn). Why should we think this is unlikely? Probably if I became convinced trapped priors are not a problem I would put much more probability on superintelligences eventually coordinating.
Another way to put it is: “Sucks to be them!” Yes sure, but also sucks to be me who lost the $1! And maybe sucks to be me who didn’t do something super hawkish and got a couple other players to best-respond! While it is true these normative intuitions pull on me less than the one you express, why should I expect this to be the case for most superintelligences?
The normative pull of your proposed procedure seems to come from a preconception that “the other player will probably best-respond to me” (and thus, my procedure is correctly shaping its incentives).
But instead we can consider the other player trying to get us to best-respond to them, by jumping up a meta-level: the player checks whether I am playing your proposed policy with a certain notion of fairness $X (which in your case is $5), and punishes accordingly to how far their notion of fairness $Y is from my $X, so that I (if I were to best-respond to his policy) would be incentivized to adopt notion of fairness $Y.
It seems clear that, for the exact same reason your argument might have some normative pull, this other argument has some normative pull in the opposite direction. It then becomes unclear which has stronger normative pull: trying to shape the incentives of the other (because you think they might play a policy one level of sophistication below yours), or trying to best-respond to the other (because you think they might play a policy one level of sophistication above yours).
I think this is exactly the deep problem, the fundamental trade-off, that agents face in both empirical and logical bargaining. I am not convinced all superintelligences will resolve this trade-off in similar enough ways to allow for Pareto-optimality (instead of falling for trapped priors i.e. commitment races), due to the resolution’s dependence on the superintelligences’ early prior.
I am denying that superintelligences play this game in a way that looks like “Pick an ordinal to be your level of sophistication, and whoever picks the higher ordinal gets $9.” I expect sufficiently smart agents to play this game in a way that doesn’t incentivize attempts by the opponent to be more sophisticated than you, nor will you find yourself incentivized to try to exploit an opponent by being more sophisticated than them, provided that both parties have the minimum level of sophistication to be that smart.
If faced with an opponent stupid enough to play the ordinal game, of course, you just refuse all offers less than $9, and they find that there’s no ordinal level of sophistication they can pick which makes you behave otherwise. Sucks to be them!
I agree most superintelligences won’t do something which is simply “play the ordinal game” (it was just an illustrative example), and that a superintelligence can implement your proposal, and that it is conceivable most superintelligences implement something close enough to your proposal that they reach Pareto-optimality. What I’m missing is why that is likely.
Indeed, the normative intuition you are expressing (that your policy shouldn’t in any case incentivize the opponent to be more sophisticated, etc.) is already a notion of fairness (although in the first meta-level, rather than object-level). And why should we expect most superintelligences to share it, given the dependence on early beliefs and other pro tanto normative intuitions (different from ex ante optimization)? Why should we expect this to be selected for? (Either inside a mind, or by external survival mechanisms)
Compare, especially, to a nascent superintelligence who believes most others might be simulating it and best-responding (thus wants to be stubborn). Why should we think this is unlikely?
Probably if I became convinced trapped priors are not a problem I would put much more probability on superintelligences eventually coordinating.
Another way to put it is: “Sucks to be them!” Yes sure, but also sucks to be me who lost the $1! And maybe sucks to be me who didn’t do something super hawkish and got a couple other players to best-respond! While it is true these normative intuitions pull on me less than the one you express, why should I expect this to be the case for most superintelligences?