Given that Bayesianism itself might be the problem, (Bayesian) value uncertainty might in fact be a counterproductive move in the long term. Hard to say right now IMO, but I wouldn’t just want to assume CIRL as a starting point for figuring out corrigibility.
In my model, this is very close to an impossibility proof for the desiredatums of corrigibility and AI capabilities stronger than human capabilities.
In other words, corrigibility is doomed if Bayesian uncertainty can’t handle it.
In my model, this is very close to an impossibility proof for the desiredatums of corrigibility and AI capabilities stronger than human capabilities.
In other words, corrigibility is doomed if Bayesian uncertainty can’t handle it.