I think 1% in the next year and a half is significantly too low.
Firstly, conditioning on AGI researchers makes a pretty big difference. It rules out most mainstream AI researchers, including many of the most prominent ones who get the most media coverage. So I suspect your gut feeling about what people would say isn’t taking this sufficiently into account.
Secondly, I think attributing ignorance to the outgroup is a pretty common fallacy, so you should be careful of that. I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans. Maybe they couldn’t give very concrete disaster scenarios, but neither can many of us.
And thirdly, once you get agreement that there are problems, you basically get “we should fix the problems first” for free. I model most AGI researchers as thinking that AGI is far enough away that we can figure out practical ways to prevent these things, like better protocols for giving feedback. So they’ll agree that we should do that first, because they think that it’ll happen automatically anyway.
In a similar vein to this, I found several resources that make me think it should be higher than 1% currently and in the next 1.5 years:
This 2012⁄3 paper by Vincent Müller and Nick Bostrom surveyed AI experts, in particular, 72 people who attended AGI workshops (most of whom do technical work). Of these 72, 36% thought that assuming HLMI would at some point exist, it would be either ‘on balance bad’ or ‘extremely bad’ for humanity. Obviously this isn’t an indication that they understand or agree with safety concerns, but directionally suggests people are concerned and thinking about this.
This 2017 paper by Seth Baum identified 45 projects on AGI and their stance on safety (page 25). Of these, 12 were active on safety (dedicated efforts to address AGI safety issues), 3 were moderate (acknowledge safety issues, but don’t have dedicated efforts to address them), and 2 were dismissive (argue that AGI safety concerns are incorrect). The remaining 28 did not specify their stance.
This is relevant, but I tend to think this sort of evidence isn’t really getting at what I want. My main reaction is one that you already said:
Obviously this isn’t an indication that they understand or agree with safety concerns, but directionally suggests people are concerned and thinking about this.
I think many people have a general prior of “we should be careful with wildly important technologies”, and so will say things like “safety is important” and “AGI might be bad”, without having much of an understanding of why.
Also, I don’t expect the specific populations surveyed in those two sources to overlap much with “top AI researchers” as defined in the question, though I have low confidence in that claim.
These seem sensible comments to me, I had similar thoughts about current understanding of things like reward gaming. I’d be curious to see your snapshot?
surprisingly powerful demonstration soon could change things too, 1% seems low. look at how quickly views can change about things like it’s just the flu, current wave of updating from gpt3 (among certain communities), etc
one conceptual contribution I’d put forward for consideration is whether this question may more about emotions or social equilibria than about reaching a reasoned intellectual consensus. it’s worth considering how a relatively proximate/homogenous group of people tends to change its beliefs. for better or worse, everything from viscerally compelling demonstrations of safety problems to social pressure to coercion or top-down influence to the transition from intellectual to grounded/felt risk should be part of the model of change—alongside rational, lucid, considered debate tied to deeper understanding or the truth of the matter. the demonstration doesn’t actually have to be a compelling demonstration of risks to be a compelling illustration of them (imagine a really compelling VR experience, as a trivial example).
maybe the term I’d use is ‘belief cascades’, and I might point to a rapid shift towards office closures during early COVID as an example of this. the tipping point arrived sooner than some expected, not due to considered updates in beliefs about risk or the utility of closures (the evidence had been there for a while), but rather from a cascade of fear, a noisy consensus that not acting/thinking in alignment with the perceived consensus (‘this is a real concern’) would lead to social censure, etc.
in short, this might happen sooner, more suddenly, and for stranger reasons than I think the prior distribution implies.
NB the point about a newly unveiled population of researchers in my first bin might stretch the definition of ‘top AI researchers’ in the question specification, but I believe it’s in line with the spirit of the question
+1 for the general idea of belief cascades. This is an important point, though I had already considered it. When I said “percolates to the general AI community over the next few years” I wasn’t imagining that this would happen via reasoned intellectual discourse, I was more imagining compelling demonstrations (which may or may not be well-connected to the actual reasons for worry).
I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans.
Seems plausible, but I specifically asked for reward gaming, instrumental convergence, and the challenges of value learning. (I’m fine with not having concrete disaster scenarios.) Do you still think this is that plausible?
And thirdly, once you get agreement that there are problems, you basically get “we should fix the problems first” for free.
I agree that Q2 is more of a blocker than Q3, though I am less optimistic than you seem to be.
Overall I updated towards slightly sooner based on your comment and Beth’s comment below (given that both of you interact with more AGI researchers than I do), but not that much, because I’m not sure whether you were looking at just reward gaming or all three conditions I laid out, and most of the other considerations were ones I had thought about and it’s not obvious how to update on an argument of the form “I think <already-considered consideration>, therefore you should update in this direction”. It would have been easier to update on “I think <already-considered consideration>, therefore the absolute probability in the next N years is X%”.
I think 1% in the next year and a half is significantly too low.
Firstly, conditioning on AGI researchers makes a pretty big difference. It rules out most mainstream AI researchers, including many of the most prominent ones who get the most media coverage. So I suspect your gut feeling about what people would say isn’t taking this sufficiently into account.
Secondly, I think attributing ignorance to the outgroup is a pretty common fallacy, so you should be careful of that. I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans. Maybe they couldn’t give very concrete disaster scenarios, but neither can many of us.
And thirdly, once you get agreement that there are problems, you basically get “we should fix the problems first” for free. I model most AGI researchers as thinking that AGI is far enough away that we can figure out practical ways to prevent these things, like better protocols for giving feedback. So they’ll agree that we should do that first, because they think that it’ll happen automatically anyway.
In a similar vein to this, I found several resources that make me think it should be higher than 1% currently and in the next 1.5 years:
This 2012⁄3 paper by Vincent Müller and Nick Bostrom surveyed AI experts, in particular, 72 people who attended AGI workshops (most of whom do technical work). Of these 72, 36% thought that assuming HLMI would at some point exist, it would be either ‘on balance bad’ or ‘extremely bad’ for humanity. Obviously this isn’t an indication that they understand or agree with safety concerns, but directionally suggests people are concerned and thinking about this.
This 2017 paper by Seth Baum identified 45 projects on AGI and their stance on safety (page 25). Of these, 12 were active on safety (dedicated efforts to address AGI safety issues), 3 were moderate (acknowledge safety issues, but don’t have dedicated efforts to address them), and 2 were dismissive (argue that AGI safety concerns are incorrect). The remaining 28 did not specify their stance.
This is relevant, but I tend to think this sort of evidence isn’t really getting at what I want. My main reaction is one that you already said:
I think many people have a general prior of “we should be careful with wildly important technologies”, and so will say things like “safety is important” and “AGI might be bad”, without having much of an understanding of why.
Also, I don’t expect the specific populations surveyed in those two sources to overlap much with “top AI researchers” as defined in the question, though I have low confidence in that claim.
These seem sensible comments to me, I had similar thoughts about current understanding of things like reward gaming. I’d be curious to see your snapshot?
surprisingly powerful demonstration soon could change things too, 1% seems low. look at how quickly views can change about things like it’s just the flu, current wave of updating from gpt3 (among certain communities), etc
my (quickly-made) snapshot: https://elicit.ought.org/builder/dmtz3sNSY
one conceptual contribution I’d put forward for consideration is whether this question may more about emotions or social equilibria than about reaching a reasoned intellectual consensus. it’s worth considering how a relatively proximate/homogenous group of people tends to change its beliefs. for better or worse, everything from viscerally compelling demonstrations of safety problems to social pressure to coercion or top-down influence to the transition from intellectual to grounded/felt risk should be part of the model of change—alongside rational, lucid, considered debate tied to deeper understanding or the truth of the matter. the demonstration doesn’t actually have to be a compelling demonstration of risks to be a compelling illustration of them (imagine a really compelling VR experience, as a trivial example).
maybe the term I’d use is ‘belief cascades’, and I might point to a rapid shift towards office closures during early COVID as an example of this. the tipping point arrived sooner than some expected, not due to considered updates in beliefs about risk or the utility of closures (the evidence had been there for a while), but rather from a cascade of fear, a noisy consensus that not acting/thinking in alignment with the perceived consensus (‘this is a real concern’) would lead to social censure, etc.
in short, this might happen sooner, more suddenly, and for stranger reasons than I think the prior distribution implies.
NB the point about a newly unveiled population of researchers in my first bin might stretch the definition of ‘top AI researchers’ in the question specification, but I believe it’s in line with the spirit of the question
+1 for the general idea of belief cascades. This is an important point, though I had already considered it. When I said “percolates to the general AI community over the next few years” I wasn’t imagining that this would happen via reasoned intellectual discourse, I was more imagining compelling demonstrations (which may or may not be well-connected to the actual reasons for worry).
Seems plausible, but I specifically asked for reward gaming, instrumental convergence, and the challenges of value learning. (I’m fine with not having concrete disaster scenarios.) Do you still think this is that plausible?
I agree that Q2 is more of a blocker than Q3, though I am less optimistic than you seem to be.
Overall I updated towards slightly sooner based on your comment and Beth’s comment below (given that both of you interact with more AGI researchers than I do), but not that much, because I’m not sure whether you were looking at just reward gaming or all three conditions I laid out, and most of the other considerations were ones I had thought about and it’s not obvious how to update on an argument of the form “I think <already-considered consideration>, therefore you should update in this direction”. It would have been easier to update on “I think <already-considered consideration>, therefore the absolute probability in the next N years is X%”.
Yeah I also thought this might just be true already, for similar reasons