Charlie Steiner comments on The Polarity Problem [Draft]

Charlie Steiner 31 May 2023 14:45 UTC
LW: 2 AF: 1
0
AF
A nice exposition.
For myself I’d prefer the same material much more condensed and to-the-point, but I recognize that there are publication venues that prefer more flowing text.
E.g. compare
We turn next to the laggard. Compared to the fixed roles model, the laggard’s decision problem in the variable roles model is more complex primarily in that it must now consider the expected utility of attacking as opposed to defending or pursuing other goals. When it comes to the expected utility of defending or pursuing other goals, we can simply copy the formulas from Section 7. To calculate the laggard’s expected utility of attacking, however, we must make two changes to the formula that applies to the leader. First, we must consider the probability that choosing to attack rather than defend will result in the laggard being left defenseless if the leader executes an attack. Second, as we saw, the victory condition for the laggard’s attack requires that AT + LT < DT. Formally, we have:
to
The laggard now has the same decisions as the leader, unlike the fixed roles model. However, the laggard must consider that attacking may leave them defenseless if the leader attacks. Also, of course, the victory conditions for attack and defense have the lag time on the other side.
Two suggestions for things to explore:
People often care about the Nash equilibrium of games. For the simple game with perfect information this might be trivial, but it’s at least a little interesting with imperfect information.
Second, What about bargaining? Attacking and defending is costly, and AIs might be able to make agreements that they literally cannot break, essentially turning a multipolar scenario into a unipolar scenario where the effective goals are achieving a Pareto optimum of the original goals. Which Pareto optimum exactly will depend on things like the available alternatives, i.e. the power differential. Not super familiar with the bargaining literature so I can’t point you at great academic references, just blog posts.
My thoughts on the strategy are that this is overly optimistic. This picture where you have ten AGIs and exactly one of them is friendly is unlikely due to the logistic success curve. Or if the heterogeneity of the AGIs is due to heterogeneity of humans (maybe Facebook builds one AI and Google builds the other, or maybe there are good open-source AI tools that let lots of individuals build AGIs around the same time) rather than stochasticity of outcomes given humanity’s best AGI designs, why would the lab building the unfriendly AGI also use your safeguard interventions?
I also expect that more reaslistic models will increasingly favor the leader, as they can bring to bear information and resources in a way that doesn’t just look like atomic “Attack” or “Defend” actions. This isn’t necessarily bad, but it definitely makes it more important to get things right first try.