Koen.Holtman comments on The Plan

Koen.Holtman 15 Dec 2021 22:39 UTC
1 point
0
I agree in general that pursuing multiple alternative alignment approaches (and using them all together to create higher levels of safety) is valuable. I am more optimistic than you that we can design control systems (different from time horizon based myopia) which will be stable and understandable even at higher levels of AGI competence.

it still seems likely that someone, somewhere, will try fiddling around with another AGI’s time horizon parameters and cause a disaster.

Well, if you worry about people fiddling with control system tuning parameters, you also need to worry about someone fiddling with value learning parameters so that the AGI will only learn the values of a single group of people who would like to rule the rest of the world. Assming that AGI is possible, I believe it is most likely that Bostrom’s orthogonality hypothesis will hold for it. I am not optimistic about desiging an AGI system which is inherently fiddle-proof.