I proposed changing “instrumental convergence” to “robust instrumentality.” This proposal has not caught on, and so I reverted the post’s terminology. I’ll just keep using ‘convergently instrumental.’ I do think that ‘convergently instrumental’ makes more sense than ‘instrumentally convergent’, since the agent isn’t “convergent for instrumental reasons”, but rather, it’s more reasonable to say that the instrumentality is convergent in some sense.
For the record, the post used to contain the following section:
A note on terminology
The robustness-of-strategy phenomenon became known as the instrumental convergence hypothesis, but I propose we call it robust instrumentality instead.
From the paper’s introduction:
An action is said to be instrumental to an objective when it helps achieve that objective. Some actions are instrumental to many objectives, making them robustly instrumental. The so-called instrumental convergence thesis is the claim that agents with many different goals, if given time to learn and plan, will eventually converge on exhibiting certain common patterns of behavior that are robustly instrumental (e.g. survival, accessing usable energy, access to computing resources). Bostrom et al.’s instrumental convergence thesis might more aptly be called the robust instrumentality thesis, because it makes no reference to limits or converging processes:
“Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.”
Some authors have suggested that gaining power over the environment is a robustly instrumental behavior pattern on which learning agents generally converge as they tend towards optimality. If so, robust instrumentality presents a safety concern for the alignment of advanced reinforcement learning systems with human society: such systems might seek to gain power over humans as part of their environment. For example, Marvin Minsky imagined that an agent tasked with proving the Riemann hypothesis might rationally turn the planet into computational resources.
This choice is not costless: many are already acclimated to the existing ‘instrumental convergence.’ It even has its own Wikipedia page. Nonetheless, if there ever were a time to make the shift, that time would be now.
I proposed changing “instrumental convergence” to “robust instrumentality.” This proposal has not caught on, and so I reverted the post’s terminology. I’ll just keep using ‘convergently instrumental.’ I do think that ‘convergently instrumental’ makes more sense than ‘instrumentally convergent’, since the agent isn’t “convergent for instrumental reasons”, but rather, it’s more reasonable to say that the instrumentality is convergent in some sense.
For the record, the post used to contain the following section:
A note on terminology
The robustness-of-strategy phenomenon became known as the instrumental convergence hypothesis, but I propose we call it robust instrumentality instead.
From the paper’s introduction:
This choice is not costless: many are already acclimated to the existing ‘instrumental convergence.’ It even has its own Wikipedia page. Nonetheless, if there ever were a time to make the shift, that time would be now.