Mild optimization

TagLast edit: 19 Feb 2025 22:10 UTC by RobertM

“Mild optimization” is where, if you ask a Task AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it’s not optimizing that hard. It’s okay with just painting one car pink; it isn’t driven to max out the twentieth decimal place of its car-painting score.

Other suggested terms for this concept have included “soft optimization”, “sufficient optimization”, “minimum viable solution”, “pretty good optimization”, “moderate optimization”, “regularized optimization”, “sensible optimization”, “casual optimization”, “adequate optimization”, “good-not-great optimization”, “lenient optimization”, “parsimonious optimization”, and “optimehzation”.

Difference from low impact

Mild optimization is complementary to taskiness and low impact. A low impact AGI might try to paint one car pink while minimizing its other footprint or how many other things changed, but it would be trying as hard as possible to minimize that impact and drive it down as close to zero as possible, which might come with its own set of pathologies.

What we really want is both properties. We want the AGI to paint one car pink in a way that gets the impact pretty low and then, you know, that’s good enough—not have a cognitive pressure to search through weird extremes looking for a way to decrease the twentieth decimal place of the impact. This would tend to break a low impact measure which contained even a subtle flaw, where a mild-optimizing AGI might not put as much pressure on the low impact measure and hence be less likely to break it.

(Obviously, what we want is a perfect low impact measure which will keep us safe even if subjected to unlimited optimization power, but a basic security mindset is to try to make each part safe on its own, then assume it might contain a flaw and try to design the rest of the system to be safe anyway.)

Difference from satisficing

Satisficing utility functions don’t necessarily mandate or even allow mildness.

Suppose the AI’s utility function is 1 when at least one car has been painted pink and 0 otherwise—there’s no more utility to be gained by outcomes in which more cars have been painted pink. Will this AI still go to crazy-seeming lengths?

Yes, because in a partially uncertain / probabilistic environment, there’s still no upper bound on the utility which can be gained. A solution with 0.9999 probability of painting at least one car pink is ranked above a solution with a 0.999 probability of painting at least one car pink.

If a preference ordering $<_{p}$ has the property that for every probability distribution on expected outcomes $O$ there’s another expected outcome $O^{'}$ with $O <_{p} O^{'}$ which requires one more erg of energy to achieve, this is a sufficient condition for using up all the energy in the universe. If converting all reachable matter into pink-painted cars implies a slightly higher probability, that at least one car is pink, that’s the maximum of expected utility under the 0-1 utility function.

Less naive satisficing would describe an optimizer which satisfies an expected utility constraint—say, if any policy produces at least 0.95 expected utility under the 0-1 utility function, the AI can implement that policy.

This rule is now a Task and would at least permit mild optimization. The problem is that it doesn’t exclude extremely optimized solutions. A 0.99999999 probability of producing at least one pink-painted car also has the property that it’s above a 0.95 probability. If you’re a self-modifying satisficer, replacing yourself with a maximizer is probably a satisficing solution.

Even if we’re not dealing with a completely self-modifying agent, there’s a ubiquity of points where adding more optimization pressure might satisfice. When you build a thermostat in the environment, you’re coercing one part of the environment to have a particular temperature; if this kind of thing doesn’t count as “more optimization pressure” then we could be dealing with all sorts of additional optimizing-ness that falls short of constructing a full subagent or doing a full self-modification. There’s all sorts of steps in cognition where it would be just as easy to add a maximizing step (take the highest-ranking solution) as to take a random high-ranking solution.

On a higher level of abstraction, the problem is that while satisficing is reflectively consistent, it’s not reflectively stable. A satisficing agent is happy to construct another satisficing agent, but it may also be happy to construct a maximizing agent. It can approve its current mode of thinking, but it approves other modes of thinking too. So unless all the cognitive steps are being carried out locally on fixed known algorithms that satisfice but definitely don’t maximize, without the AGI constructing any environmental computations or conditional policy steps more complicated than a pocket calculator, building a seemingly mild satisficer doesn’t guarantee that optimization stays mild.

Quantilizing

One weird idea that seems like it might exhibit incremental progress toward reflectively stable mild optimization is Jessicat’s expected utility quantilizer. Roughly, a quantilizer estimates expected outcomes relative to a null action, and then tries to produce an expected outcome in some upper quantile of possibilities—e.g., an outcome in the top 1% of expected outcomes. Furthermore, a quantilizer only tries to narrow outcomes by that much—it doesn’t try to produce one particular outcome in the top 1%; the most it will ever try to do is randomly pick an outcome such that this random distribution corresponds to being in the top 1% of expected outcomes.

Quantilizing corresponds to maximizing expected utility under the assumption that there is uncertainty about which outcomes are good and an adversarial process which can make some outcomes arbitrarily bad, subject to the constraint that the expected utility of the null action can only be boundedly low. So if there’s an outcome which would be very improbable given the status quo, the adversary can make that outcome be very bad. This means that rather than aiming for one single high-utility outcome which the adversary could then make very bad, a quantilizer tries for a range of possible good outcomes. This in turn means that quantilizers will actively avoid narrowing down the future too much, even if by doing so they’d enter regions of very high utility.

Quantilization doesn’t seem like exactly what we actually want for multiple reasons. E.g., if long-run good outcomes are very improbable given status quo, it seems like a quantilizer would try to have its policies fall short of that in the long run (a similar problem seems like it might appear in impact measures which imply that good long-run outcomes have high impact).

The key important idea that appears in quantilizing is that a quantilizer isn’t just as happy to rewrite itself as a maximizer, and isn’t just as happy to implement a policy that involves constructing a more powerful optimizer in the environment.

Relation to other problems

Mild optimization relates directly to one of the three core reasons why aligning at-least-partially superhuman AGI is hard—making very powerful optimization pressures flow through the system puts a lot of stress on its potential weaknesses and flaws. To the extent we can get mild optimization stable, it might take some of the critical-failure pressure off other parts of the system. (Though again, basic security mindset says to still try to get all the parts of the system as flawless as possible and not tolerate any known flaws in them, then build the fallback options in case they’re flawed anyway; one should not deliberately rely on the fallbacks and intend them to be activated.)

Mild optimization seems strongly complementary to low impact and taskiness. Something that’s merely low-impact might exhibit pathological behavior from trying to drive side impacts down to absolutely zero. Something that merely optimizes mildly might find some ‘weak’ or ‘not actually trying that hard’ solution which nonetheless ended up turning the galaxies into pink-painted cars. Something that has a satisfiable utility function with a readily-achievable maximum achievable utility might still go to tremendous lengths to drive the probability of achieving maximum utility to nearly 1. Something that optimizes mildly and has a low impact penalty and has a small, clearly achievable goal, seems much more like the sort of agent that might, you know, just paint the damn car pink and then stop.

Mild optimization can be seen as a further desideratum of the currently open Other-izer Problem: Besides being workable for bounded agents, and being reflectively stable, we’d also like an other-izer idiom to have a (stable) mildness parameter.

Approaches

It currently seems like the key subproblem in mild optimization revolves around reflective stability—we don’t want “replace the mild optimization part with a simple maximizer, becoming a maximizer isn’t that hard and gets the task done” to count as a ‘mild’ solution. Even in human intuitive terms of “optimizing without putting in an unreasonable amount of effort”, at some point a sufficiently advanced human intelligence gets lazy and starts building an AGI to do things for them because it’s easier that way and only takes a bounded amount of effort. We don’t want “construct a second AGI that does hard optimization” to count as mild optimization even if it ends up not taking all that much effort for the first AGI, although “construct an AGI that does $θ$ -mild optimization” could potentially count as a $θ$ -mildsolution.

Similarly, we don’t want to allow the deliberate creation of environmental or internal daemons even if it’s easy to do it that way or requires low effort to end up with that side effect—we’d want the optimizing power of such daemons to count against the measured optimization power and be rejected as optimizing too hard.

Since both of these phenomena seem hard to exhibit in current machine learning algorithms or faithfully represent in a toy problem, unbounded analysis seems likely to be the main way to go. In general, it seems closely related to the Other-izer Problem which also seems most amenable to unbounded analysis at the present time.