Tools want to become agents

In the spirit of “satisficers want to become maximisers” here is a somewhat weaker argument (growing out of a discussion with Daniel Dewey) that “tool AIs” would want to become agent AIs.

The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI’s controller—energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.

And the AI’s controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is “turn me into an agent AI with that goal.” The smarter the AI, the better this plan is. Of course, the plan need not read literally like that—it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI’s software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we’ve programmed the tool AI’s preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.

In any case, it does us good to realise that “make me into an agent” is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it’s motivated to make us make it into a agent.