Project Proposal: Considerations for trading off capabilities and safety impacts of AI research

There seems to be some amount of consensus that people working on AI safety (at least within fairly mainstream ML /​ AI paradigms) shouldn’t worry much about the effects of their projects on AI capabilities. New researchers might even try to push capabilities research forward, to build career capital. The best argument for this, IMO, is that someone else is probably going to do it if you don’t, likely within the next 6 months (given the current pace of research).

I mostly agree with this view, but I do still think a bit about effects of my research on capabilities, and think others should as well. Being concerned about advancing capabilities has, in the past, moved me away from pursuing ambitious capabilities projects which might have been very good for my career if they paid off, but I always saw someone else do the thing I was considering soon afterwards anyways...

But as far as I know, nobody has tried to evaluate this question thoroughly and systematically. This is concerning, because it seems like current attitudes could plausibly be a result of motivated reasoning (i.e. “I want to keep doing my research, and probably would do so even if I saw a compelling case against it”) and groupthink (“nobody else is worrying about this”). I’m not sure it’s really tractable, but I think it could be worth ~1-4 people spending a bit of time (possibly up to ~6-24 months, if it ends up looking tractable after some initial thought/​investigation) on trying to do a fairly comprehensive treatment of this question.

The main deliverables could be practical guidelines for AI safety researchers, e.g.:

  • Figuring out when it makes sense to be concerned about advancing AI capabilities via ones’ research.

  • How to decide how significant those concerns are, and whether they should preclude working on that line of research or research project, or change the publication model for them.

The project could intersect with current “dual-use” considerations (e.g. RE GPT-2).

(also worth mentioning): I know MIRI now has secret research, and I think they have a reasonable case for that, since they aren’t in the mainstream paradigms. I do think it would be good for them to have a “hit publication” within the ML community, and might be worth pushing some out-of-the-box ideas which might advance capabilities. The reason is that MIRI has very little credibility, or even name recognition in the ML community, ATM, and I think it would be a big deal in terms of “perception of AI safety concerns within the ML community” if that changed. And I think the ML communities perceptions are important, because the ML community’s attitude seems of critical importance for getting good Xrisk reduction policies in place (IIRC, I talked to someone at MIRI who disagreed with that perspective).

The idea to write this post came out of discussion with Joe Collman.