[Question] Unknown Unknowns in AI Alignment

It seems to me that no matter how many problems from different research agendas we solve, there is always the possibility that some ‘unknown unknown’ misalignment scenario can occur. I can imagine an approach of building model-agnostic, environment-agnostic minimal assumption alignment guarantees (which seems to be super hard), but I feel like things can go wrong in myriad other ways, even then.

Has there been any discussion about how we might go about this?