If we restrict the space of its terminal goals to things we can imagine (and then set about proving each thing to be friendly) then we can be sure that even thinking in ways we cannot fathom, as long as its goal structure doesn’t change (this seems decoupled from intelligence ie paperclip maximiser) it won’t ever do bad things X Y or Z (because it checks them against its terminal goal).
That directly contradicts the EY’s CEV, where whatever we can imagine is no more than a part of the Initial Dynamics. “Thou shalt...” or “Thou shalt not...” is not going to do the trick.
If we restrict the space of its terminal goals to things we can imagine (and then set about proving each thing to be friendly) then we can be sure that even thinking in ways we cannot fathom, as long as its goal structure doesn’t change (this seems decoupled from intelligence ie paperclip maximiser) it won’t ever do bad things X Y or Z (because it checks them against its terminal goal).
That directly contradicts the EY’s CEV, where whatever we can imagine is no more than a part of the Initial Dynamics. “Thou shalt...” or “Thou shalt not...” is not going to do the trick.
Right. Downgrading my estimate of how well I understand the problem.