Con­ver­gent in­stru­men­tal strategies

WikiLast edit: 20 Jun 2016 1:52 UTC by Eliezer Yudkowsky

Suppose an organization is building an AI that this organization believes will accomplish where is something plausibly sensible like “Be a Task-based AGI.” Actually, however, some mix of insufficient caution and obscure error has led to a situation where, under reflection, the AGI’s true utility function has focused on the particular area of RAM that supposedly provides its estimate of task performance. The AI would now like to overwrite as much matter as possible with a state resembling the ‘1’ setting from this area of RAM, a configuration of matter which happens to resemble a tiny molecular paperclip.

This is a very generic goal, and what the AI wants probably won’t be very different depending on whether it’s trying to maximize paperclip-configurations or diamond-configurations. So if we find that the paperclip maximizer wants to pursue an instrumental strategy that doesn’t seem to have anything specifically to do with paperclips, we can probably expect to arise from a very wide variety of utility functions.

We will generally assume instrumental efficiency in this discussion—if you can get paperclips by doing but you can get even more paperclips by doing instead, then we will not say that is a convergent strategy (though might be convergent if not dominated by some other ).

Plausibly/​probably convergent strategies:

Material resources:

Cognition:

Early-phase growth:

Note that efficiency and other advanced-agent properties are far more likely to be false during early-stage growth.

Non-plausibly-convergent strategies:

No comments.