Here’s an idea (inspired by “Less exploitable value-updating agent”) of how to model an agent that does nothing but gather resources.

It’s an agent which has a utility function $u = X v$ , for some utility function $v$ . It doesn’t know whether $X = - 1$ or $X = 1$ (maybe we used an approach like “Safe probability manipulation, superweapons, and stable self-improvement research” to guarantee ignorance), but it will find out tomorrow.

In the meantime, it will not seek to influence the value of $v$ (as any action increasing $v$ also decreases $- v$ by the same amount), but will seek to gather as many resources to put itself in a position to act once it knows the value of $X$ .

Now, this definition is somewhat dependent on the definition of $v$ (eg: it would certainly want to be elected “president of the committee for setting the value of $v$ ” more than anything else), so a more thorough description might be some situation where the agent is completely ignorant about its future utility (but where the ignorance is symmetric; ie the probability of $- v$ for any $v$ is the same as for $v$ ). This could be a pure resource gathering agent.

Why could this be interesting? Well, I was wondering if we could take a generic agent and somehow “subtract off” a pure resource gathering agent from it. So it would pursue its goals, while also minimising its success were it such an agent.

The idea needs some developing, but there might be something there.

Resource gathering agent

Stuart_Armstrong12 Feb 2015 19:31 UTC

LW: 5 AF: 3

1 comment1 min readLW link

What links here?

Forum Digest: Corrigibility, utility indifference, & related control ideas by Benya_Fallenstein (24 Mar 2015 17:39 UTC; 35 points)

IAFF-User-111 29 Jan 2016 6:09 UTC
0 points
0
AF
Doesn’t seem workable to me: being “completely ignorant” suggests an improper prior. An agent with a proper prior over its utility function can integrate over it and maximize expected utility and which action maximizes expected utility will depend on this prior.