[Question] What AI safety problems need solving for safe AI research assistants?

In his AI Safety “Suc­cess Sto­ries” post, Wei Dai writes:

[This] com­par­i­son table makes Re­search As­sis­tant seem a par­tic­u­larly at­trac­tive sce­nario to aim for, as a step­ping stone to a more defini­tive suc­cess story. Is this con­clu­sion ac­tu­ally jus­tified?

I share Wei Dai’s in­tu­ition that the Re­search As­sis­tant path is ne­glected, and I want to bet­ter un­der­stand the safety prob­lems in­volved in this path.

Speci­fi­cally, I’m en­vi­sion­ing AI re­search as­sis­tants, built with­out any kind of re­in­force­ment learn­ing, that help AI al­ign­ment re­searchers iden­tify, un­der­stand, and solve AI al­ign­ment prob­lems. Some con­crete ex­am­ples:

Pos­si­ble with yes­ter­day’s tech­nol­ogy: Doc­u­ment clus­ter­ing that au­to­mat­i­cally or­ga­nizes ev­ery blog post about AI al­ign­ment. Recom­men­da­tion sys­tems that find AI al­ign­ment posts similar to the one you’re read­ing & iden­tify con­nec­tions be­tween the think­ing of var­i­ous au­thors.

May be pos­si­ble with cur­rent or near fu­ture tech­nol­ogy: An AI chat­bot, trained on ev­ery blog post about AI al­ign­ment, which makes the case for AI al­ign­ment to skep­tics or at­tempts to shoot down FAI pro­pos­als. Text sum­ma­riza­tion soft­ware that com­presses a long dis­cus­sion be­tween two fo­rum users in a way that both feel is ac­cu­rate and fair. A NLP sys­tem that au­to­mat­i­cally or­ga­nizes AI safety writ­ings into a prob­lem/​solu­tion table as I de­scribed in this post.

May be pos­si­ble with fu­ture break­throughs in un­su­per­vised learn­ing, gen­er­a­tive mod­el­ing, nat­u­ral lan­guage un­der­stand­ing, etc.: An AI sys­tem that gen­er­ates novel FAI pro­pos­als, or writes code for an FAI di­rectly, and tries to break its own de­signs. An AI sys­tem that aug­ments the prob­lem/​solu­tion table from this post with new rows and columns gen­er­ated based on origi­nal rea­son­ing.

What safety prob­lems are in­volved in cre­at­ing re­search as­sis­tants of this sort? I’m es­pe­cially in­ter­ested in safety prob­lems which haven’t yet re­ceived much at­ten­tion, and safety prob­lems with ad­vanced as­sis­tants based on fu­ture break­throughs.

No answers.