Richard_Ngo comments on Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns

Richard_Ngo 22 Jul 2020 19:00 UTC
LW: 11 AF: 6
0
AF
I think 1% in the next year and a half is significantly too low.

Firstly, conditioning on AGI researchers makes a pretty big difference. It rules out most mainstream AI researchers, including many of the most prominent ones who get the most media coverage. So I suspect your gut feeling about what people would say isn’t taking this sufficiently into account.

Secondly, I think attributing ignorance to the outgroup is a pretty common fallacy, so you should be careful of that. I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans. Maybe they couldn’t give very concrete disaster scenarios, but neither can many of us.

And thirdly, once you get agreement that there are problems, you basically get “we should fix the problems first” for free. I model most AGI researchers as thinking that AGI is far enough away that we can figure out practical ways to prevent these things, like better protocols for giving feedback. So they’ll agree that we should do that first, because they think that it’ll happen automatically anyway.
What links here?
- Benjamin Rachbach's comment on Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns by stuhlmueller (23 Jul 2020 1:07 UTC; 3 points)
- Amandango 23 Jul 2020 22:43 UTC
  9 points
  0
  Parent
  In a similar vein to this, I found several resources that make me think it should be higher than 1% currently and in the next 1.5 years:
  - This ²⁰¹²⁄₃ paper by Vincent Müller and Nick Bostrom surveyed AI experts, in particular, 72 people who attended AGI workshops (most of whom do technical work). Of these 72, 36% thought that assuming HLMI would at some point exist, it would be either ‘on balance bad’ or ‘extremely bad’ for humanity. Obviously this isn’t an indication that they understand or agree with safety concerns, but directionally suggests people are concerned and thinking about this.
  - This 2017 paper by Seth Baum identified 45 projects on AGI and their stance on safety (page 25). Of these, 12 were active on safety (dedicated efforts to address AGI safety issues), 3 were moderate (acknowledge safety issues, but don’t have dedicated efforts to address them), and 2 were dismissive (argue that AGI safety concerns are incorrect). The remaining 28 did not specify their stance.
  - Rohin Shah 3 Aug 2020 19:51 UTC
    2 points
    0
    Parent
    This is relevant, but I tend to think this sort of evidence isn’t really getting at what I want. My main reaction is one that you already said:
    Obviously this isn’t an indication that they understand or agree with safety concerns, but directionally suggests people are concerned and thinking about this.
    I think many people have a general prior of “we should be careful with wildly important technologies”, and so will say things like “safety is important” and “AGI might be bad”, without having much of an understanding of why.
    Also, I don’t expect the specific populations surveyed in those two sources to overlap much with “top AI researchers” as defined in the question, though I have low confidence in that claim.
    What links here?
    Rohin Shah's comment on Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns by stuhlmueller (3 Aug 2020 20:24 UTC; 2 points)
- Ben Pace 22 Jul 2020 19:23 UTC
  LW: 5 AF: 3
  0
  AF Parent
  These seem sensible comments to me, I had similar thoughts about current understanding of things like reward gaming. I’d be curious to see your snapshot?
  - happysisyphus 22 Jul 2020 23:42 UTC
    5 points
    0
    Parent
    surprisingly powerful demonstration soon could change things too, 1% seems low. look at how quickly views can change about things like it’s just the flu, current wave of updating from gpt3 (among certain communities), etc
    - happysisyphus 23 Jul 2020 0:17 UTC
      9 points
      0
      Parent
      my (quickly-made) snapshot: https://elicit.ought.org/builder/dmtz3sNSY
      one conceptual contribution I’d put forward for consideration is whether this question may more about emotions or social equilibria than about reaching a reasoned intellectual consensus. it’s worth considering how a relatively proximate/homogenous group of people tends to change its beliefs. for better or worse, everything from viscerally compelling demonstrations of safety problems to social pressure to coercion or top-down influence to the transition from intellectual to grounded/felt risk should be part of the model of change—alongside rational, lucid, considered debate tied to deeper understanding or the truth of the matter. the demonstration doesn’t actually have to be a compelling demonstration of risks to be a compelling illustration of them (imagine a really compelling VR experience, as a trivial example).
      maybe the term I’d use is ‘belief cascades’, and I might point to a rapid shift towards office closures during early COVID as an example of this. the tipping point arrived sooner than some expected, not due to considered updates in beliefs about risk or the utility of closures (the evidence had been there for a while), but rather from a cascade of fear, a noisy consensus that not acting/thinking in alignment with the perceived consensus (‘this is a real concern’) would lead to social censure, etc.
      in short, this might happen sooner, more suddenly, and for stranger reasons than I think the prior distribution implies.
      NB the point about a newly unveiled population of researchers in my first bin might stretch the definition of ‘top AI researchers’ in the question specification, but I believe it’s in line with the spirit of the question
      - Rohin Shah 3 Aug 2020 19:53 UTC
        2 points
        0
        Parent
        +1 for the general idea of belief cascades. This is an important point, though I had already considered it. When I said “percolates to the general AI community over the next few years” I wasn’t imagining that this would happen via reasoned intellectual discourse, I was more imagining compelling demonstrations (which may or may not be well-connected to the actual reasons for worry).
- Rohin Shah 3 Aug 2020 19:47 UTC
  LW: 3 AF: 2
  0
  AF Parent
  I think a clear majority of AGI researchers are probably familiar with the concept of reward gaming by now, and could talk coherently about AGIs reward gaming, or manipulating humans.
  Seems plausible, but I specifically asked for reward gaming, instrumental convergence, and the challenges of value learning. (I’m fine with not having concrete disaster scenarios.) Do you still think this is that plausible?
  And thirdly, once you get agreement that there are problems, you basically get “we should fix the problems first” for free.
  I agree that Q2 is more of a blocker than Q3, though I am less optimistic than you seem to be.
  Overall I updated towards slightly sooner based on your comment and Beth’s comment below (given that both of you interact with more AGI researchers than I do), but not that much, because I’m not sure whether you were looking at just reward gaming or all three conditions I laid out, and most of the other considerations were ones I had thought about and it’s not obvious how to update on an argument of the form “I think <already-considered consideration>, therefore you should update in this direction”. It would have been easier to update on “I think <already-considered consideration>, therefore the absolute probability in the next N years is X%”.
- Beth Barnes 23 Jul 2020 3:34 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Yeah I also thought this might just be true already, for similar reasons