Takeaways from safety by default interviews

Link post

Last year, several researchers at AI Impacts (primarily Robert Long and I) interviewed prominent researchers inside and outside of the AI safety field who are relatively optimistic about advanced AI being developed safely. These interviews were originally intended to focus narrowly on reasons for optimism, but we ended up covering a variety of topics, including AGI timelines, the likelihood of current techniques leading to AGI, and what the right things to do in AI safety are right now.

We talked to Ernest Davis, Paul Christiano, Rohin Shah, Adam Gleave, and Robin Hanson.

Here are some more general things I personally found noteworthy while conducting these interviews. For interview-specific summaries, check out our Interviews Page.

Relative optimism in AI often comes from the belief that AGI will be developed gradually, and problems will be fixed as they are found rather than neglected.

All of the researchers we talked to seemed to believe in non-discontinuous takeoff.¹ Rohin gave ‘problems will likely be fixed as they come up’ as his primary reason for optimism,² Adam³ and Paul⁴ both mentioned it as a reason.

Relatedly, both Rohin⁵ and Paul⁶ said one thing that could update their views was gaining information about how institutions relevant to AI will handle AI safety problems– potentially by seeing them solve relevant problems, or by looking at historical examples.

I think this is a pretty big crux around the optimism view; my impression is that MIRI researchers generally think that 1) the development of human-level AI will likely be fast and potentially discontinuous and 2) people will be incentivized to hack around and redeploy AI when they encounter problems. See Likelihood of discontinuous progress around the development of AGI for more on 1). I think 2) could be a fruitful avenue for research; in particular, it might be interesting to look at recent examples of people in technology, particularly ML, correcting software issues, perhaps when they’re against their short-term profit incentives. Adam said he thought the AI research community wasn’t paying enough attention to building safe, reliable, systems.⁷

Many of the arguments I heard around relative optimism weren’t based on inside-view technical arguments.

This isn’t that surprising in hindsight, but it seems interesting to me that though we interviewed largely technical researchers, a lot of their reasoning wasn’t based particularly on inside-view technical knowledge of the safety problems. See the interviews for more evidence of this, but here’s a small sample of the not-particularly-technical claims made by interviewees:

AI researchers are likely to stop and correct broken systems rather than hack around and redeploy them.⁸
AI has and will progress via a cumulation of lots of small things rather than via a sudden important insight.⁹
Many technical problems feel intractably hard in the way that AI safety feels now, and still get solved within ~10 years.¹⁰
Evolution baked very little into humans; babies learn almost everything from their experiences in the world.¹¹

My instinct when thinking about AGI is to defer largely to safety researchers, but these reasons felt noteworthy to me in that they seemed like questions that were perhaps better answered by economists or sociologists (or for the latter case, neuroscientists) than safety researchers. I really appreciated Robin’s efforts to operationalize and analyze the second claim above.

(Of course, many of the claims were also more specific to machine learning and AI safety.)

There are lots of calls for individuals with views around AI risk to engage with each other and understand the reasoning behind fundamental disagreements.

This is especially true around views that MIRI have, which many optimistic researchers reported not having a good understanding of.

This isn’t particularly surprising, but there was a strong universal and unprompted theme that there wasn’t enough engagement around AI safety arguments. Adam and Rohin both said they had a much worse understanding than they would like of others viewpoints.¹² Robin¹³ and Paul¹⁴ both pointed to some existing but meaningful unfinished debate in the space.

— By Asya Bergal