(My sincere apologies for the delayed reply. I squeezed this shortform post out right before going on vacation to Asia, and am just now clearing my backlog to the point where I’m getting around to this.)
Ah, that’s a great point! I had read it a while back, but it wasn’t coming to mind when I was writing this. I think that’s an excellent example of a similar dynamic besides corrigibility. When I’m thinking about things, I usually flatten out the goal-space to ignore deconfusion (or however one wants to characterize the kind of progress towards one’s “true values”), but it’s clearly relevant here. Thanks for bringing it up!
(My sincere apologies for the delayed reply. I squeezed this shortform post out right before going on vacation to Asia, and am just now clearing my backlog to the point where I’m getting around to this.)
Ah, that’s a great point! I had read it a while back, but it wasn’t coming to mind when I was writing this. I think that’s an excellent example of a similar dynamic besides corrigibility. When I’m thinking about things, I usually flatten out the goal-space to ignore deconfusion (or however one wants to characterize the kind of progress towards one’s “true values”), but it’s clearly relevant here. Thanks for bringing it up!