Noosphere89 comments on CIRL Corrigibility is Fragile

Noosphere89 6 Jan 2023 15:46 UTC
1 point
−1

Given that Bayesianism itself might be the problem, (Bayesian) value uncertainty might in fact be a counterproductive move in the long term. Hard to say right now IMO, but I wouldn’t just want to assume CIRL as a starting point for figuring out corrigibility.

In my model, this is very close to an impossibility proof for the desiredatums of corrigibility and AI capabilities stronger than human capabilities.

In other words, corrigibility is doomed if Bayesian uncertainty can’t handle it.