AGI Predictions
This post is a collection of key questions that feed into AI timelines and AI safety work where it seems like there is substantial interest or disagreement amongst the LessWrong community.
You can make a prediction on a question by hovering over the widget and clicking. You can update your prediction by clicking at a new point, and remove your prediction by clicking on the same point. Try it out:
- Forum update: New features (December 2020) by 4 Dec 2020 6:45 UTC; 52 points) (EA Forum;
- AGI Predictions by 21 Nov 2020 12:02 UTC; 36 points) (EA Forum;
- SETI Predictions by 30 Nov 2020 20:09 UTC; 23 points) (
- Some biases and selection effects in AI risk discourse by 12 Dec 2023 17:55 UTC; 22 points) (
- [AN #128]: Prioritizing research on AI existential safety based on its application to governance demands by 9 Dec 2020 18:20 UTC; 16 points) (
- AI Winter Is Coming—How to profit from it? by 5 Dec 2020 20:23 UTC; 15 points) (
- Some biases and selection effects in AI risk discourse by 12 Dec 2023 17:55 UTC; 4 points) (EA Forum;
Great post! I am very curious about how people are interpreting Q10 and Q11, and what their models are. What are prototypical examples of ‘insights on a similar level to deep learning’?
Here’s a break-down of examples of things that come to my mind:
Historical DL-level advances:
the development of RL (Q-learning algorithm, etc.)
Original formulation of a single neuron i.e. affine transformation + non-linearity
Future possible DL-level:
a successor to back-prop (e.g. the how biological neurons learn)
a successor to the Q-learning family (e.g. neatly generalizing and extending ‘intrinsic motivation’ hacks)
full brain simulation
an alternative to the affine+activation recipe
Below DL-level major advances:
an elegant solution to learn from cross-modal inputs in a self-supervised fashion (babies somehow do it)
a breakthrough in active learning
a generalizable solution to learning disentangled and compositional representations
a solution to adversarial examples
Grey areas:
breakthroughs in neural architecture search
a breakthrough in neural Turing machine-type research
I’d also like to know how people’s thinking fits in with my taxonomy: Are people who leaned yes on Q11 basing their reasoning on the inadequacy of the ‘below DL-level advances’ list, or perhaps on the necessity of the ‘DL-level advances’ list? Or perhaps people interpreted those questions completely differently, and don’t agree with my dividing lines?
Thank you for asking this question and for giving that break-down. I was wondering something similar. I am not an AI scientist but DL seems like a very big deal to me, and thus I was surprised that so many people seemed to think we need more insights on that level. My charitable interpretation is that they don’t think DL is a big deal.
At time of writing, I’m assigning the highest probability to “Will AGI cause an existential catastrophe?” at 85%, with the next-highest predictions at 80% and 76%. Why … why is everyone so optimistic?? Did we learn something new about the problem actually being easier, or our civilization more competent, than previously believed?
Should—should I be trying to do more x-risk-reduction-relevant stuff (somehow), or are you guys saying you’ve basically got it covered? (In 2013, I told myself it was OK for dumb little ol’ me to personally not worry about the Singularity and focus on temporal concerns in order to not have nightmares, and it turned out that I have a lot of temporal concerns which could be indirectly very relevant to the main plot, but that’s not my real reason for focusing on them.)
IMO, we decidedly do not “basically have it covered.”
That said, IMO it is generally not a good idea for a person to try to force themselves on problems that will make them crazy, desperate need or no.
I am often tempted to downplay how much catastrophe-probability I see, basically to decrease the odds that people decide to make themselves crazy in the direct vicinity of alignment research and alignment researchers.
And on the other hand, I am tempted by the HPMOR passage:
“Girls?” whispered Susan. She was slowly pushing herself to her feet, though Hermione could see her limbs swaying and quivering. “Girls, I’m sorry for what I said before. If you’ve got anything clever and heroic to try, you might as well try it.”
(To be clear, I have hope. Also, please just don’t go crazy and don’t do stupid things.)
For me, it’s because there’s disjunctively many ways that AGI could not happen (global totalitarian regime, AI winter, 55% CFR avian flu escapes a BSL4 lab, unexpected difficulty building AGI & the planning fallacy on timelines which we totally won’t fall victim to this time...), or that alignment could be solved, or that I could be mistaken about AGI risk being a big deal, or…
Granted, I assign small probabilities to several of these events. But my credence for P(AGI extinction | no more AI alignment work from community) is 70% - much higher than my 40% unconditional credence. I guess that means yes, I think AGI risk is huge (remember that I’m saying “40% chance we just die to AGI, unconditionally”), and that’s after incorporating the significant contributions which I expect the current community to make. The current community is far from sufficient, but it’s also probably picking a good amount of low-hanging fruit, and so I expect that its presence makes a significant difference.
EDIT: I’m decreasing the 70% to 60% to better match my 40% unconditional, because only the current alignment community stops working on alignment.
Some reasons.
I’ve gone from roughly 2⁄3 to 1⁄2 on existential catastrophe (I’ve put 58% here, was feeling pessimistic) based on the big projects having safety teams who I think are doing really good work. That probably falls under our civilization being more competent than previously believed.
There is a huge difference in the responses to Q1 (“Will AGI cause an existential catastrophe?”) and Q2 (“...without additional intervention from the existing AI Alignment research community”), to a point that seems almost unjustifiable to me. To pick the first matching example I found (and not to purposefully pick on anybody in particular), Daniel Kokotajlo thinks there’s a 93% chance of existential risk without the AI Alignment community’s involvement, but only 53% with. This implies that there’s a ~43% chance of the AI Alignment community solving the problem, conditional on it being real and unsolved otherwise, but only a ~7% chance of it not occurring for any other reason, including the possibility of it being solved by the researchers building the systems, or the concern being largely incorrect.
What makes people so confident in the AI Alignment research community solving this problem, far above that of any other alternative?
I also noticed Daniel’s difference in probabilities there, and thought they were substantial. But it doesn’t seem unreasonable to me. The existing AI x-risk community has changed the global conversation on AI and also been responsible for much in the way of funding and direct research on many related technical problems. I could talk about the specific technical work, or the impact that things like the AI FOOM Debate had on Superintelligence had on OpenPhil, or CFAR on FLI on Musk on OpenAI. Or I could go into detail about the research being done on topics like Iterated Amplification and Agent Foundations and so on and ways that this seems to me to be clear progress on subproblems. I’m not sure exactly what alternatives you might have in mind.
To emphasize, the clash I’m perceiving is not the chance assigned to these problems being tractable, but to the relative probability of ‘AI Alignment researchers’ solving the problems, as compared to everyone else and every other explanation. In particular, people building AI systems intrinsically spend a degree of their effort, even if completely unconvinced about the merits of AI risk, trying to make systems aligned, just because that’s a fundamental part of building a useful AI.
I have a sort of Yudkowskian pessimism towards most of these things (policy won’t actually help; Iterated Amplification won’t actually work), but I’ll try to put that aside here for a bit. What I’m curious about is what makes these sort of ideas only discoverable in this specific network of people, under these specific institutions, and particularly more promising than other sorts of more classical alignment.
Isn’t Iterated Amplification in the class of things you’d expect people to try just to get their early systems to work, at least with ≥20% probability? Not, to be clear, exactly that system, but just fundamentally RL systems that take extra steps to preserve the intentionality of the optimization process.
To rephrase a bit, it seems to me that a worldview in which AI alignment is sufficiently tractable that Iterated Amplification is a huge step towards a solution, would also be a worldview in which AI alignment is sufficiently easy (though not necessarily easy) that there should be a much larger prior belief that it gets solved anyway.
FWIW, I made these judgments quickly and intuitively and thus could easily have just made a silly mistake. Thank you for pointing this out.
So, what do I think now, reflecting a bit more?
--The 7% judgment still seems correct to me. I feel pretty screwed in a world where our entire community stops thinking about this stuff. I think it’s because of Yudkowskian pessimism combined with the heavy-tailed nature of impact and research. A world without this community would still be a world where people put some effort into solving the problem, but there would be less effort, by less capable people, and it would be more half-hearted/not directed at actually solving the problem/not actually taking the problem seriously.
--The other judgment? Maybe I’m too optimistic about the world where we continue working. But idk, I am rather impressed by our community and I think we’ve been making steady progress on all our goals over the last few years. Moreover, OpenAI and DeepMind seem to be taking safety concerns mildly seriously due to having people in our community working there. This makes me optimistic that if we keep at it, they’ll take it very seriously, and that would be great.
I interpreted the question as something like “if nobody cares about safety and there isn’t a community that takes a special interest in it, will we be safe”. I don’t think it’s specifically this AI Alignment community solving it, it’s just that if nobody tries to solve the problem, the problem will stay unsolved.
Edit: And I do now see that I misinterpreted the question. Updated my second estimate downwards because of that. Thanks for pointing this out!
In the following, an event is “catastrophic” if it endangers several human lives; it need not be an existential catastrophe.
Edit: I meant to say “deceptive alignment”, but the meaning should be clear either way.
”Catastrophic” is normally used in the term ”global catastrophic risk” and means something like “kills 100,000s of people”, so I do think “doesn’t necessarily kill but could’ve killed a couple of people” is a fairly different meaning. In retrospect I realize that I put my answer to the second question far too high — if it just means “a deceptive aligned system nearly gives a few people in hospital a fatal dosage but it’s stopped and we don’t know why the system messed up” then it’s quite plausible nothing this substantial will happen as a result of that.
Agreed. In retrospect, I might have opted for “pre-AGI nearly-deadly accident caused by deceptive alignment.”
I intended the situation to be more like “we catch the AI pretending to be aligned, but actually lying, and it almost or does kill at least a few people as a result of that.”
With #1, I’m trying to have people predict the “deception is robustly instrumental behavior, but AIs will be bad at it at first and so we’ll catch them.” #2 is trying to operationalize whether this would be viewed as a fire alarm.
Some ways you might think scenario #1 won’t happen:
You don’t think deception will be incentivized
Fast takeoff means the AI is never smart enough to deceive but dumb enough to get caught
Our transparency tools won’t be good enough for many people to believe it was actually deceptively aligned
Also: we solve alignment really well on paper, and that’s why deception doesn’t arise. (I assign non-trivial probability to this.)
I suspect this is intentional, but the set {1,6,7,8} of predictions in redundant, in the sense that probabilities for three of them mathematically imply the probability of the forth due to the law of total probability.
In particular, if #1 is A and #6 is B, then #7 and #8 are A|B and A|¬B, and we have the equality
P(A)=P(A|B)P(B)+P(A|¬B)P(¬B)
The probability I would assign to #8 intuitively is about 0,41. Math based on my other three predictions yields (doing the calculation now) 0.476. I am going to predict the math output rather than my intuition.
Did anyone else calculate their level of inconsistency?
I think the correct response to this realization is not to revise your final answer so as to make it consistent with the first three. It is to revise all four answers so that they are maximally intuitive, subject to the constraint that they be jointly consistent. Which answer comes last is just an artifact of the order of presentation, so it isn’t a rational basis for privileging some answers over others.
This is only true if, for example, you think AI would cause GDP growth. My model assigns a lot of probability to ‘AI kills everyone before (human-relevant) GDP goes up that fast’, so questions #7 and #8 are conditional on me being wrong about that. If we can last any small multiples of a year with AI smart enough to double GDP in that timeframe, then things probably aren’t as bad as I thought.
How to add your own questions:
Go to elicit.org/binary
Type your question into the field at the top
Click on the question title, and click the copy URL button
Paste the URL into the LessWrong editor
See our launch post for more details!
I suspect this question is misworded:
Will there be a 4 year interval in which world GDP growth doubles before the first 1 year interval in which world GDP growth doubles?
Do you mean in which world GDP doubles? World GDP growth doubles when it goes from, say, 0.5% yearly growth to 1% yearly growth.
Personally, I suspect world GDP is most likely to next double in a period after a severe war or depression, so you might want to rephrase to avoid that scenario if that isn’t what you’re thinking about.
This was a good catch! I did actually mean world GDP, not world GDP growth. Because people have already predicted on this, I added the correct questions above as new questions, and am leaving the previous questions here for reference:
I really appreciate the effort that went into collecting all of these questions, framing them clearly, and coding the clickable predictions.
“Will > 50% of AGI researchers agree with safety concerns by 2030?”
From my research, I think they mostly already do, they just use different framings, and care about different time frames.
Fwiw, I think the operationalization of the question is stronger than it appears at first glance, and that’s why estimates are low.
That was fun. This time, I tried not to update too much on other people’s predictions. In particular, I’m at 1% for “Will we experience an existential catastrophe before we build AGI?” and at 70% for “Will there be another AI Winter (a period commonly referred to as such) before we develop AGI?”, but would probably defer to a better aggregate on the second one.
So the following, for example, don’t count as “existential risk caused by AGI”, right?
many AIs
an economy run by advanced AIs amplifying negative externalities, such as pollution, leading to our demise
an em world with minds evolving to the point of being non-valuable anymore (“a Disneyland without children”)
a war by transcending uploads
narrow AI
a narrow AI killing all humans (ex.: by designing grey goo, a virus, etc.)
a narrow AI eroding trust in society until it breaks apart
intermediary cause by an AGI, but not ultimate cause
a simulation shutdown because our AI didn’t have a decision theory for acausal cooperation
an AI convincing a human to destroy the world
Thanks a lot for the feature and this post! I’ll be really interested by an analysis after a lot of answers are in.
Wouldn’t it be better to have the other votes visible only after voting? People could be highly influenced by seeing how many and who voted what.
I’ve been seeing an intermittent bug on a few of these where tapping to record an answer causes the question text to disappear. Sometimes scrolling away and back fixes it.
Chrome browser on Android phone.
This is intentional. The question text shares space with the list of users and their respective predictions. On mobile, this means when you tap on a section, you see the users who voted in the corresponding range, until you tap away.
Ah, makes sense. I guess I just need to get used to the interface.
Yeah, we had to make some tradeoffs because I really wanted them to fit into a small space, and also to never resize when you interact with them, while also not dominating any post in which they are in. Not sure whether we hit the perfect balance of the tradeoffs.
What level of background in AI alignment are you assuming/desiring for respondents? Is it just “all readers” where the assumption is that any cultural osmosis etc. is included in what you’re trying to measure?
Yeah, any LWer is welcome to record their predictions :)