Seth Herd comments on List your AI X-Risk cruxes!

Seth Herd 28 Apr 2024 20:43 UTC
17 points
2
I think this is a great project! Clarifying why informed people have such different opinions on AGI x-risk seems like a useful path to improving our odds. I’ve been working on a post on alignment difficulty cruxes that covers much of hte same ground.
Your list is a good starting point. I’d add:
Time window of analysis: I think a lot of people give a low p(doom) because they’re only thinking about the few years after we get real AGI.
Paul Christiano, for instance, adds a substantial chance that we’ve “irreversibly messed up our future within 10 years of building powerful AI” over and above the odds that we all die from takeover or misuse. (in My views on “doom”, from April 2023).
Here are my top 4 cruxes for alignment difficulty which is different but highly overlapping with p(doom).
How AGI will be designed and aligned
- How we attempt it is obviously important for the odds of success
How well RL alignment will generalize
- most people are assuming we’ll use RL for alignment. I think we shouldn’t and won’t
Whether we need to understand human values better
- I think we shouldn’t and won’t try to align AGI to human values, but many think we will try and fail
Whether societal factors are included in alignment difficulty
- Sometimes people are just answering whether the AGI will do what we want, and not including whether that will result in doom (eg., ASI super-weapons proliferate until someone starts a hyper-destructive conflict).
Other important cruxes are mentioned in Stop talking about p(doom) - basically, what the heck one includes in their calculation, like my first point on time windows.