faul_sname comments on All AGI Safety questions welcome (especially basic ones) [April 2023]

faul_sname 11 Apr 2023 1:36 UTC
2 points
0
I have noticed in discussions of AI alignment here that there is a particular emphasis on scenarios where there is a single entity which controls the course of the future. In particular, I have seen the idea of a pivotal act (an action which steers the state of the universe in a billion years such that it is better than it otherwise would be) floating around rather a lot, and the term seems to be primarily used in the context of “an unaligned AI will almost certainly steer the future in ways that do not include living humans, and the only way to prevent this is for someone to build an aligned AI which will prevent any unaligned AIs from coming into existence at any time in the future”.
I can think of several possible explanations for this emphasis: here are 6 possibilities
1. A unipolar world is quite likely, because the first entity to achieve a certain level of capabilities will rapidly bootstrap its capabilities to the level of “much more capable than all other entities combined”.
  
  My impression is that this is the mainstream LW viewpoint. However, our current trajectory looks to me more like “a very large number of instances of language models learn to create and use tools, where those tools include other specialized language models” and less like “a single coherent entity rapidly gains information about the world and itself, and uses that information to optimize itself such that its effective computational power is a significant fraction of all the computational power in the world”.
  
  If this is indeed the mainstream viewpoint, I think that would entail a rejection of the bitter lesson. So in this case, my question is whether there are any testable experiments which could be run which would falsify the bitter lesson (without dooming everyone obviously).
2. A world where a pivotal act is possible is unlikely, and will be generally fine (along the current trajectory) in worlds where pivotal acts are not possible. But in the unlikely event that we live in a world where pivotal acts are possible, we would want someone to do that act because we would be doomed in short order otherwise.
3. A world where a pivotal act is possible is unlikely, but it is the only world where we survive, so we should focus on that scenario in the spirit of playing to our outs. We are doomed in worlds where a pivotal act is not possible, not just on our current trajectory, but on every conceivable trajectory.
4. Pivotal acts would still be possible in a meaningfully multipolar world because [reasons, I really can’t imagine how this could be true but that may be a failure of imagination on my part].
5. Any seemingly multipolar world will become effectively unipolar even if there are multiple powerful agents within that world because those agents will be able to come to an agreement with each other, and collectively execute the pivotal act.
6. Something else entirely different.
I am curious which of these explanations (if any) is driving the discussion behind the “pivotal act” framing. (I notice that I actually have several internal Eliezers which are screaming about different things in this question, and am actually not sure whether the real one would fall under (1), (3), or (6) -- I’d probably predict 50% / 15% / 25%, with the remaining 10% smeared across the other options). I ask because it seems like the optimal strategy seems very very different depending on which world we live in.