Alternately, a satisficer could build a maximiser.
Yep. Coding “don’t unleash (or become) a maximiser or something similar” is very tricky.
I notice a satisficing agent isn’t well-defined. What happens when it has two ways of satisfying its goals? It may be possible to make a safe one if you come up with a good enough answer to that question.
It may be. But encoding “safe” for a satisficer sounds like it’s probably just as hard as constructing a safe utility function in the first place.
Yep. Coding “don’t unleash (or become) a maximiser or something similar” is very tricky.
It may be. But encoding “safe” for a satisficer sounds like it’s probably just as hard as constructing a safe utility function in the first place.