Not all non-maximisers become maximisers

Because of the problems with optimisers, there’s a search for other types of designs that can accomplish goals without the extreme optimisation pressure. Satisficers are one example of such designs, but satisficers aren’t reflectively stable (a satisficer can design a successor agent that isn’t a satisficer) and aren’t even reflectively consistent (there are situations where a satisficer will prefer to become a maximiser rather than staying a satisficer).

The general failure mode of an other-ising design is that it turns into a maximiser.


Here’s a design that doesn’t turn into a maximiser, at least not directly. It’s not useful for anything directly, but it does show some of the odd behaviours that certain agents can indulge in.

Doubly bounded satisficer

A Doubly-bounded satisficer (DBS) is one that has a utility function and two bounds . It’s aim is to take actions that ensure the expected utility of is bounded between and .

It’s clear that, in general, the DBS won’t make itself into a -maximiser (since that might have an expectation above ) or into a -minimiser (since that might have an expectation below ). But how exactly will it behave?

Assume for illustration purposes that , and that is very small.

No stochasticity

If there is no stochasticity or uncertainty, the DBS knows that for every policy there is a corresponding value of . Therefore it will select a policy such that .

Alternately, it can simply turn itself into a maximiser, where iff , and otherwise.

Stochastic choice between outcomes

Let’s add some uncertainty to the setup. The DBS knows that two numbers and will be drawn uniformly randomly from , and it will then have the choice between actions and actions . Then and .

Set equal to

In this case, maximising (and breaking ties randomly) is still a good policy for the agent. But so is randomly choosing actions! Both these have an expected of .

In contrast, maximising and breaking ties by choosing the highest action, has an expected utility of around . So it’s not the -maximising that’s doing the job here.

Set equal to

In this case, maximising straight up is a good policy (and all good policies are close to that one). But that’s an artefact of these particular values of and .

Set equal to

In this case, one good policy is:

#. If and are both above choose the highest option. #. Otherwise, choose randomly.

This policy has an expectation of , but there is no natural utility function maximisation that corresponds to this policy.

Value of information (and ignorance)

If the agent can self modify before receiving any extra information, then that information always has non-negative value (trivial proof: the agent can self modify into an agent that ignores the extra information, if need be).

But if the agent cannot self modify, then it might prefer not to know the information, depending on how it breaks ties and other considerations.

Generalisations

We can generalise this to a “satisficer” that wants expected utility to be in a certain subset of the range of , not necessarily an interval (if has a hole inside it, it becomes even more clear why an agent might want to avoid information: a mix of compatible policies need not be a compatible policy).

Anyway this agent is somewhat odd, but its an interesting agent that doesn’t immediately want to become a maximiser.

No comments.