Seth Herd comments on The Compendium, A full argument about extinction risk from AGI

Seth Herd 3 Nov 2024 18:45 UTC
4 points
2
I don’t think coordinating a billion copies of GPT-7 is at all what the worried tend to worry about. We worry about a single agent based on GPT-7 self-improving until it can take over singlehanded- perhaps with copies it made itself specifically optimized for coordination, perhaps sticking to only less intelligent servant agents. The alternative is also a possible route to disaster, but I think things would go off the rails far before then. You’re in good if minority company in worrying about slower and more law-abiding takeovers; Christiano’s stance on doom seems to place most of the odds of disaster in these scenarios, for instance; but I don’t understand why other of you see it as so likely that we partway solve the alignment problem but don’t use that to prevent them from slowly progressive outcompeting humans. It seems like an unlikely combination of technical success and societal idiocy. Although to be fair, when I phrase it that way, it does sound kind of like our species MO :)
On your other contention that AI will probably follow norms and laws, constraining takeover attempts like coups are constrained: I agree that some of the same constraints may apply, but that is little comfort. It’s technically correct that AIs would probably use whatever avenue is available, including nonviolent and legal ones, to accomplish their goals (and potentially disempower humans).
Assuming AIs will follow norms, laws, and social constraints even when ignoring them would work better is assuming we’ve almost completely solved alignment. If that happens, great, but that is a technical objective we’re working toward, not an outcome we can assume when thinking about AI safety. LLM do have powerful norm-following habits; this will be a huge help in achieving alignment if they form the core of AGI, but it does not entirely solve the problem.
I have wondered in response to similar statements you’ve made in the past: are you including the observation that human history is chock full of people ignoring norms, laws, and social constraints when they think they can get away with it? I see our current state of civilization as a remarkable achievement that is fragile and must be carefully protected against seismic shifts in power balances, including AGI but also with other potential destabilizing factors of the sort that have brought down governments and social orders in the past.
In sum, if you’re arguing that AGI won’t necessarily violently take over right away, I agree. If you’re arguing that it wouldn’t do that if it had the chance, I think that is an entirely technical question of whether we’ve succeeded adequately at alignment.