Steven Byrnes comments on Disentangling Perspectives On Strategy-Stealing in AI Safety

Steven Byrnes 19 Dec 2021 22:13 UTC
LW: 7 AF: 5
0
AF
I understood the idea of Paul’s post as: if we start in a world where humans-with-aligned-AIs control 50% of relevant resources (computers, land, minerals, whatever), and unaligned AIs control 50% of relevant resources, and where the strategy-stealing assumption is true—i.e., the assumption that any good strategy that the unaligned AIs can do, the humans-with-aligned-AIs are equally capable of doing themselves—then the humans-with-aligned-AIs will wind up controlling 50% of the long-term future. And the same argument probably holds for 99%-1% or any other ratio. This part seems perfectly plausible to me, if all those assumptions hold.
Then we can talk about why the strategy-stealing assumption is not in fact true. The unaligned AIs can cause wars and pandemics and food shortages and removing-all-the-oxygen-from-the-atmosphere to harm the humans-with-aligned-AIs, but not so much vice-versa. The unaligned AI can execute a good strategy which the humans-and-aligned-AIs are too uncoordinated to do, instead the latter will just be bickering amongst themselves, hamstrung by following laws and customs and taboos etc., and not having a good coherent idea of what they’re trying to do anyway. The aligned AIs might be less capable than an unaligned AI because of “alignment tax”—we make them safe by making them less powerful (they act conservatively, there are humans in the loop, etc.). And so on and so forth. All this stuff is in Paul’s post, I think.
I feel like Paul’s post is a great post in all those details, but I would have replaced the conclusion section with
“So, in summary, for all these reasons, the strategy-stealing assumption (in this context) is more-or-less totally false and we shouldn’t waste our time thinking about it”
whereas Paul’s conclusion section is kinda the opposite. (Zvi’s comment along the same lines.)
I feel like a lot of this post is listing reasons that the strategy-stealing assumption is false (e.g. humans don’t know what they’re trying to do and can’t coordinate with each other regardless), which are mostly consistent with Paul’s post. It also notes that there are situations in which we don’t care whether the strategy-stealing assumption is true or false (e.g. unipolar AGI outcomes, situations where all the AIs are misaligned, etc.).
And then other parts of the post are, umm, I’m not sure, sending “something is wrong” vibes that I’m not really understanding or sympathizing with…
What links here?
- Steven Byrnes's comment on Late 2021 MIRI Conversations: AMA / Discussion by Rob Bensinger (6 Mar 2022 15:13 UTC; 6 points)