Satisficers’ undefined behaviour
I previously posted an example of a satisficer (an agent seeking to achieve a certain level of expected utility u) transforming itself into a maximiser (an agent wanting to maximise expected u) to better achieve its satisficing goals.
But the real problem with satisficers isn’t that they “want” to become maximisers; the real problem is that their behaviour is undefined. We conceive of them as agents that would do the minimum required to reach a certain goal, but we don’t specify “minimum required”.
For example, let A be a satisficing agent. It has a utility u that is quadratic in the number of paperclips it builds, except that after building 10100, it gets a special extra exponential reward, until 101000, where the extra reward becomes logarithmic, and after 1010000, it also gets utility in the number of human frowns divided by 3↑↑↑3 (unless someone gets tortured by dust specks for 50 years).
A’s satisficing goal is a minimum expected utility of 0.5, and, in one minute, the agent can press a button to create a single paperclip.
So pressing the button is enough. In the coming minute, A could decide to transform itself into a u-maximiser (as that still ensures the button gets pressed). But it could also do a lot of other things. It could transform itself into a v-maximiser, for many different v’s (generally speaking, given any v, either v or -v will result in the button being pressed). It could break out, send a subagent to transform the universe into cream cheese, and then press the button. It could rewrite itself into a dedicated button pressing agent. It could write a giant Harry Potter fanfic, force people on Reddit to come up with creative solutions for pressing the button, and then implement the best.
All these actions are possible for a satisficer, and are completely compatible with its motivations. This is why satisficers are un(der)defined, and why any behaviour we want from it—such as “minimum required” impact—has to be put in deliberately.
I’ve got some ideas for how to achieve this, being posted here.