[Question] Unknown Unknowns in AI Alignment

It seems to me that no mat­ter how many prob­lems from differ­ent re­search agen­das we solve, there is always the pos­si­bil­ity that some ‘un­known un­known’ mis­al­ign­ment sce­nario can oc­cur. I can imag­ine an ap­proach of build­ing model-ag­nos­tic, en­vi­ron­ment-ag­nos­tic min­i­mal as­sump­tion al­ign­ment guaran­tees (which seems to be su­per hard), but I feel like things can go wrong in myr­iad other ways, even then.

Has there been any dis­cus­sion about how we might go about this?