Vladimir_Nesov comments on arisAlexis’s Shortform

Vladimir_Nesov 25 Mar 2023 0:28 UTC
2 points
0
I think this is an important equivocation (direct alignment vs. transitive alignment). If first AGIs such as LLMs turn out to be aligned at least in the sense of keeping humanity safe, that by itself doesn’t exempt them from the reach of Moloch. The reason alignment is hard is that it might take longer to figure out than developing misaligned AGIs. This doesn’t automatically stop applying when the researchers are themselves aligned AGIs. While AGI-assisted (or more likely, AGI-led) alignment research is faster than human-led alignment research, so is AGI capability research.

Thus it’s possible that P(first AGIs are misaligned) is low, that is first AGIs are directly aligned, while P(doom) is still high, if first AGIs fail to protect themselves (and by extension humanity) from future misaligned AGIs they develop (they are not transitively aligned, same as most humans), because they failed to establish strong coordination norms required to prevent deployment of dangerous misaligned AGIs anywhere in the world.

At the same time, this is not about the timespan, because as soon as first AGIs develop nanotech, they are going to operate on many orders of magnitude more custom hardware that’s going to increase both serial speed and scale of available computation to the point where everything related to settling into an alignment security equilibrium is going to happen within a very short span of physical time. It might take first AGIs a couple of years to get there (if they manage to restrain themselves and not build a misaligned AGI even earlier), but then in a few weeks it’s all going to get settled, one way or the other.