[Question] Should you publish solutions to corrigibility?

This question is partly motivated by observing recent discussions about corrigibility and wondering to what extent the people involved have thought about how their results might be used.

If there existed practically implementable ways to make AGIs corrigible to arbitrary principals, that would enable a wide range of actors to eventually control powerful AGIs. Whether that would be net good or bad on expectation would depend on the values/​morality of the principals of such AGIs.

Currently it seems highly unclear what kinds of people we should expect to end up in control of corrigible ASIs, if corrigibility were practically feasible.

What (crucial) considerations should one take into account, when deciding whether to publish—or with whom to privately share—various kinds of corrigibility-related results?

No comments.