Takeaways from safety by default interviews

Link post

Last year, sev­eral re­searchers at AI Im­pacts (pri­mar­ily Robert Long and I) in­ter­viewed promi­nent re­searchers in­side and out­side of the AI safety field who are rel­a­tively op­ti­mistic about ad­vanced AI be­ing de­vel­oped safely. Th­ese in­ter­views were origi­nally in­tended to fo­cus nar­rowly on rea­sons for op­ti­mism, but we ended up cov­er­ing a va­ri­ety of top­ics, in­clud­ing AGI timelines, the like­li­hood of cur­rent tech­niques lead­ing to AGI, and what the right things to do in AI safety are right now.

We talked to Ernest Davis, Paul Chris­ti­ano, Ro­hin Shah, Adam Gleave, and Robin Han­son.

Here are some more gen­eral things I per­son­ally found note­wor­thy while con­duct­ing these in­ter­views. For in­ter­view-spe­cific sum­maries, check out our In­ter­views Page.

Rel­a­tive op­ti­mism in AI of­ten comes from the be­lief that AGI will be de­vel­oped grad­u­ally, and prob­lems will be fixed as they are found rather than ne­glected.

All of the re­searchers we talked to seemed to be­lieve in non-dis­con­tin­u­ous take­off.1 Ro­hin gave ‘prob­lems will likely be fixed as they come up’ as his pri­mary rea­son for op­ti­mism,2 Adam3 and Paul4 both men­tioned it as a rea­son.

Re­lat­edly, both Ro­hin5 and Paul6 said one thing that could up­date their views was gain­ing in­for­ma­tion about how in­sti­tu­tions rele­vant to AI will han­dle AI safety prob­lems– po­ten­tially by see­ing them solve rele­vant prob­lems, or by look­ing at his­tor­i­cal ex­am­ples.

I think this is a pretty big crux around the op­ti­mism view; my im­pres­sion is that MIRI re­searchers gen­er­ally think that 1) the de­vel­op­ment of hu­man-level AI will likely be fast and po­ten­tially dis­con­tin­u­ous and 2) peo­ple will be in­cen­tivized to hack around and re­de­ploy AI when they en­counter prob­lems. See Like­li­hood of dis­con­tin­u­ous progress around the de­vel­op­ment of AGI for more on 1). I think 2) could be a fruit­ful av­enue for re­search; in par­tic­u­lar, it might be in­ter­est­ing to look at re­cent ex­am­ples of peo­ple in tech­nol­ogy, par­tic­u­larly ML, cor­rect­ing soft­ware is­sues, per­haps when they’re against their short-term profit in­cen­tives. Adam said he thought the AI re­search com­mu­nity wasn’t pay­ing enough at­ten­tion to build­ing safe, re­li­able, sys­tems.7

Many of the ar­gu­ments I heard around rel­a­tive op­ti­mism weren’t based on in­side-view tech­ni­cal ar­gu­ments.

This isn’t that sur­pris­ing in hind­sight, but it seems in­ter­est­ing to me that though we in­ter­viewed largely tech­ni­cal re­searchers, a lot of their rea­son­ing wasn’t based par­tic­u­larly on in­side-view tech­ni­cal knowl­edge of the safety prob­lems. See the in­ter­views for more ev­i­dence of this, but here’s a small sam­ple of the not-par­tic­u­larly-tech­ni­cal claims made by in­ter­vie­wees:

  • AI re­searchers are likely to stop and cor­rect bro­ken sys­tems rather than hack around and re­de­ploy them.8

  • AI has and will progress via a cu­mu­la­tion of lots of small things rather than via a sud­den im­por­tant in­sight.9

  • Many tech­ni­cal prob­lems feel in­tractably hard in the way that AI safety feels now, and still get solved within ~10 years.10

  • Evolu­tion baked very lit­tle into hu­mans; ba­bies learn al­most ev­ery­thing from their ex­pe­riences in the world.11

My in­stinct when think­ing about AGI is to defer largely to safety re­searchers, but these rea­sons felt note­wor­thy to me in that they seemed like ques­tions that were per­haps bet­ter an­swered by economists or so­ciol­o­gists (or for the lat­ter case, neu­ro­scien­tists) than safety re­searchers. I re­ally ap­pre­ci­ated Robin’s efforts to op­er­a­tional­ize and an­a­lyze the sec­ond claim above.

(Of course, many of the claims were also more spe­cific to ma­chine learn­ing and AI safety.)

There are lots of calls for in­di­vi­d­u­als with views around AI risk to en­gage with each other and un­der­stand the rea­son­ing be­hind fun­da­men­tal dis­agree­ments.

This is es­pe­cially true around views that MIRI have, which many op­ti­mistic re­searchers re­ported not hav­ing a good un­der­stand­ing of.

This isn’t par­tic­u­larly sur­pris­ing, but there was a strong uni­ver­sal and un­prompted theme that there wasn’t enough en­gage­ment around AI safety ar­gu­ments. Adam and Ro­hin both said they had a much worse un­der­stand­ing than they would like of oth­ers view­points.12 Robin13 and Paul14 both pointed to some ex­ist­ing but mean­ingful un­finished de­bate in the space.

By Asya Bergal