Vladimir_Nesov comments on All AGI Safety questions welcome (especially basic ones) [April 2023]

Vladimir_Nesov 8 Apr 2023 19:20 UTC
2 points
0
Sure, it’s more of a reframing of the question in a direction where I’m aware of an interesting answer. Specifically, since you mentioned alignment problems, satisficers sound like something that should fight goodharting, and that might need awareness of scope of robustness, not just optimizing less forcefully.

Looking at the question more closely, one problem is that the way you are talking about a satisficer, it might have a different type signature from EU maximizers. (Unlike expected utility maximizers, “satisficers” don’t have a standard definition.) EU maximizer can compare events (parts of the sample space) and choose one with higher expected utility, which is equivalent to coherent preference between such events. So an EU agent is not just taking actions in individual possible worlds that are points of the sample space (that the utility function evaluates on). Instead it’s taking actions in possible “decision situations” (which are not the same thing as possible worlds or events) that offer a choice between multiple events in the sample space, each event representing uncertainty about possible worlds, and with no opportunity to choose outcomes that are not on offer in this particular “decision situation”.

But a satisficer, under a minimal definition, just picks a point of the space, instead of comparing given events (subspaces). For example, if given a choice among events that all have very high expected utility (higher than the satisficer’s threshold), what is the satisficer going to do? Perhaps it should choose the option with least expected utility, but that’s unclear (and likely doesn’t result in utility maximization for any utility function, or anything reasonable from the alignment point of view). So the problem seems underspecified.