We don’t want our core values changed; we would really rather avoid the murder pill and we’d put up resistance if someone tried to force one down our throat. Which is a sensible strategy, for steering away from a world full of murders.
OTOH …people do things that are known to modify values , such as travelling, getting an education and starting a family.
The trouble is that almost all goals (for most reasonable measures you could put on a space of goals) prescribe “don’t let your goal be changed” because letting your goal get changed is usually a bad strategy for achieving your goal
A von Neumann rationalist isn’t necessarily incorrigible, it depends on the fine details of the goal specification. A goal of “ensure as many paperclip as possible in the universe” encourages self cloning, and discourages voluntary shut down. A goal of “make paperclips while you are switched in” does not. “Make paperclip while that’s your goal”, even less so.
A great deal of the danger of AI arises from the fact that sufficiently smart reasoners are likely to converge on behaviors like “gain power” and “don’t let people shut me off.”
There a solution.** If it is at all possible to instill goals, to align AI, the Instrumental Convergence problem can be countered by instilling terminal goals that are the exact opposite** … remember,
instrumental goals are always subservient to terminal ones. So, if we are worried about a powerful AI going on a resource acquisition spree , we can give it a terminal goal to be economical in the use of resources.
OTOH …people do things that are known to modify values , such as travelling, getting an education and starting a family.
A von Neumann rationalist isn’t necessarily incorrigible, it depends on the fine details of the goal specification. A goal of “ensure as many paperclip as possible in the universe” encourages self cloning, and discourages voluntary shut down. A goal of “make paperclips while you are switched in” does not. “Make paperclip while that’s your goal”, even less so.
There a solution.** If it is at all possible to instill goals, to align AI, the Instrumental Convergence problem can be countered by instilling terminal goals that are the exact opposite** … remember, instrumental goals are always subservient to terminal ones. So, if we are worried about a powerful AI going on a resource acquisition spree , we can give it a terminal goal to be economical in the use of resources.