Alternately, a satisficer could build a maximiser. For example, if you don’t give it the ability to modify its own code. It also might build a paperclip-making Von Neumann machine that isn’t anywhere near a maximizer, but is still insanely dangerous.
I notice a satisficing agent isn’t well-defined. What happens when it has two ways of satisfying its goals? It may be possible to make a safe one if you come up with a good enough answer to that question.
What I usually mean by it is: maximise until some specified criterion is satisfied—and then stop.
However, perhaps “satisficing” is not quite the right word for this. IMO, agents that stop are an important class of agents. I think we need a name for them—and this is one of the nearest things. In my essay, I called them “Stopping superintelligences”.
What happens when it has two ways of satisfying its goals?
Then state that. It’s an inverse-of-time-until-satisfaction-is-complete maximiser.
The way you defined satisfaction doesn’t really work with that. The satisficer might just decide that it has a 90% chance of producing 10 paperclips, and thus its goal is complete. There is some chance of it failing in its goal later on, but this is likely to be made up by the fact that it probably will satisfy its goals with some extra. Especially if it could self-modify.
Alternately, a satisficer could build a maximiser.
Yep. Coding “don’t unleash (or become) a maximiser or something similar” is very tricky.
I notice a satisficing agent isn’t well-defined. What happens when it has two ways of satisfying its goals? It may be possible to make a safe one if you come up with a good enough answer to that question.
It may be. But encoding “safe” for a satisficer sounds like it’s probably just as hard as constructing a safe utility function in the first place.
Alternately, a satisficer could build a maximiser. For example, if you don’t give it the ability to modify its own code. It also might build a paperclip-making Von Neumann machine that isn’t anywhere near a maximizer, but is still insanely dangerous.
I notice a satisficing agent isn’t well-defined. What happens when it has two ways of satisfying its goals? It may be possible to make a safe one if you come up with a good enough answer to that question.
What I usually mean by it is: maximise until some specified criterion is satisfied—and then stop.
However, perhaps “satisficing” is not quite the right word for this. IMO, agents that stop are an important class of agents. I think we need a name for them—and this is one of the nearest things. In my essay, I called them “Stopping superintelligences”.
That’s the same as with a maximiser.
Except much more likely to come up; a maximiser facing many exactly balanced strategies in the real world is a rare occurance.
Well, usually you want satisfaction rapidly—and then things are very similar again.
Then state that. It’s an inverse-of-time-until-satisfaction-is-complete maximiser.
The way you defined satisfaction doesn’t really work with that. The satisficer might just decide that it has a 90% chance of producing 10 paperclips, and thus its goal is complete. There is some chance of it failing in its goal later on, but this is likely to be made up by the fact that it probably will satisfy its goals with some extra. Especially if it could self-modify.
Yep. Coding “don’t unleash (or become) a maximiser or something similar” is very tricky.
It may be. But encoding “safe” for a satisficer sounds like it’s probably just as hard as constructing a safe utility function in the first place.