RSS

Mild optimization

TagLast edit: 19 Feb 2025 22:10 UTC by RobertM

Mild optimization” is where, if you ask a Task AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it’s not optimizing that hard. It’s okay with just painting one car pink; it isn’t driven to max out the twentieth decimal place of its car-painting score.

Other suggested terms for this concept have included “soft optimization”, “sufficient optimization”, “minimum viable solution”, “pretty good optimization”, “moderate optimization”, “regularized optimization”, “sensible optimization”, “casual optimization”, “adequate optimization”, “good-not-great optimization”, “lenient optimization”, “parsimonious optimization”, and “optimehzation”.

Difference from low impact

Mild optimization is complementary to taskiness and low impact. A low impact AGI might try to paint one car pink while minimizing its other footprint or how many other things changed, but it would be trying as hard as possible to minimize that impact and drive it down as close to zero as possible, which might come with its own set of pathologies.

What we really want is both properties. We want the AGI to paint one car pink in a way that gets the impact pretty low and then, you know, that’s good enough—not have a cognitive pressure to search through weird extremes looking for a way to decrease the twentieth decimal place of the impact. This would tend to break a low impact measure which contained even a subtle flaw, where a mild-optimizing AGI might not put as much pressure on the low impact measure and hence be less likely to break it.

(Obviously, what we want is a perfect low impact measure which will keep us safe even if subjected to unlimited optimization power, but a basic security mindset is to try to make each part safe on its own, then assume it might contain a flaw and try to design the rest of the system to be safe anyway.)

Difference from satisficing

Satisficing utility functions don’t necessarily mandate or even allow mildness.

Suppose the AI’s utility function is 1 when at least one car has been painted pink and 0 otherwise—there’s no more utility to be gained by outcomes in which more cars have been painted pink. Will this AI still go to crazy-seeming lengths?

Yes, because in a partially uncertain /​ probabilistic environment, there’s still no upper bound on the utility which can be gained. A solution with 0.9999 probability of painting at least one car pink is ranked above a solution with a 0.999 probability of painting at least one car pink.

If a preference ordering has the property that for every probability distribution on expected outcomes there’s another expected outcome with which requires one more erg of energy to achieve, this is a sufficient condition for using up all the energy in the universe. If converting all reachable matter into pink-painted cars implies a slightly higher probability, that at least one car is pink, that’s the maximum of expected utility under the 0-1 utility function.

Less naive satisficing would describe an optimizer which satisfies an expected utility constraint—say, if any policy produces at least 0.95 expected utility under the 0-1 utility function, the AI can implement that policy.

This rule is now a Task and would at least permit mild optimization. The problem is that it doesn’t exclude extremely optimized solutions. A 0.99999999 probability of producing at least one pink-painted car also has the property that it’s above a 0.95 probability. If you’re a self-modifying satisficer, replacing yourself with a maximizer is probably a satisficing solution.

Even if we’re not dealing with a completely self-modifying agent, there’s a ubiquity of points where adding more optimization pressure might satisfice. When you build a thermostat in the environment, you’re coercing one part of the environment to have a particular temperature; if this kind of thing doesn’t count as “more optimization pressure” then we could be dealing with all sorts of additional optimizing-ness that falls short of constructing a full subagent or doing a full self-modification. There’s all sorts of steps in cognition where it would be just as easy to add a maximizing step (take the highest-ranking solution) as to take a random high-ranking solution.

On a higher level of abstraction, the problem is that while satisficing is reflectively consistent, it’s not reflectively stable. A satisficing agent is happy to construct another satisficing agent, but it may also be happy to construct a maximizing agent. It can approve its current mode of thinking, but it approves other modes of thinking too. So unless all the cognitive steps are being carried out locally on fixed known algorithms that satisfice but definitely don’t maximize, without the AGI constructing any environmental computations or conditional policy steps more complicated than a pocket calculator, building a seemingly mild satisficer doesn’t guarantee that optimization stays mild.

Quantilizing

One weird idea that seems like it might exhibit incremental progress toward reflectively stable mild optimization is Jessicat’s expected utility quantilizer. Roughly, a quantilizer estimates expected outcomes relative to a null action, and then tries to produce an expected outcome in some upper quantile of possibilities—e.g., an outcome in the top 1% of expected outcomes. Furthermore, a quantilizer only tries to narrow outcomes by that much—it doesn’t try to produce one particular outcome in the top 1%; the most it will ever try to do is randomly pick an outcome such that this random distribution corresponds to being in the top 1% of expected outcomes.

Quantilizing corresponds to maximizing expected utility under the assumption that there is uncertainty about which outcomes are good and an adversarial process which can make some outcomes arbitrarily bad, subject to the constraint that the expected utility of the null action can only be boundedly low. So if there’s an outcome which would be very improbable given the status quo, the adversary can make that outcome be very bad. This means that rather than aiming for one single high-utility outcome which the adversary could then make very bad, a quantilizer tries for a range of possible good outcomes. This in turn means that quantilizers will actively avoid narrowing down the future too much, even if by doing so they’d enter regions of very high utility.

Quantilization doesn’t seem like exactly what we actually want for multiple reasons. E.g., if long-run good outcomes are very improbable given status quo, it seems like a quantilizer would try to have its policies fall short of that in the long run (a similar problem seems like it might appear in impact measures which imply that good long-run outcomes have high impact).

The key important idea that appears in quantilizing is that a quantilizer isn’t just as happy to rewrite itself as a maximizer, and isn’t just as happy to implement a policy that involves constructing a more powerful optimizer in the environment.

Relation to other problems

Mild optimization relates directly to one of the three core reasons why aligning at-least-partially superhuman AGI is hard—making very powerful optimization pressures flow through the system puts a lot of stress on its potential weaknesses and flaws. To the extent we can get mild optimization stable, it might take some of the critical-failure pressure off other parts of the system. (Though again, basic security mindset says to still try to get all the parts of the system as flawless as possible and not tolerate any known flaws in them, then build the fallback options in case they’re flawed anyway; one should not deliberately rely on the fallbacks and intend them to be activated.)

Mild optimization seems strongly complementary to low impact and taskiness. Something that’s merely low-impact might exhibit pathological behavior from trying to drive side impacts down to absolutely zero. Something that merely optimizes mildly might find some ‘weak’ or ‘not actually trying that hard’ solution which nonetheless ended up turning the galaxies into pink-painted cars. Something that has a satisfiable utility function with a readily-achievable maximum achievable utility might still go to tremendous lengths to drive the probability of achieving maximum utility to nearly 1. Something that optimizes mildly and has a low impact penalty and has a small, clearly achievable goal, seems much more like the sort of agent that might, you know, just paint the damn car pink and then stop.

Mild optimization can be seen as a further desideratum of the currently open Other-izer Problem: Besides being workable for bounded agents, and being reflectively stable, we’d also like an other-izer idiom to have a (stable) mildness parameter.

Approaches

It currently seems like the key subproblem in mild optimization revolves around reflective stability—we don’t want “replace the mild optimization part with a simple maximizer, becoming a maximizer isn’t that hard and gets the task done” to count as a ‘mild’ solution. Even in human intuitive terms of “optimizing without putting in an unreasonable amount of effort”, at some point a sufficiently advanced human intelligence gets lazy and starts building an AGI to do things for them because it’s easier that way and only takes a bounded amount of effort. We don’t want “construct a second AGI that does hard optimization” to count as mild optimization even if it ends up not taking all that much effort for the first AGI, although “construct an AGI that does -mild optimization” could potentially count as a -mildsolution.

Similarly, we don’t want to allow the deliberate creation of environmental or internal daemons even if it’s easy to do it that way or requires low effort to end up with that side effect—we’d want the optimizing power of such daemons to count against the measured optimization power and be rejected as optimizing too hard.

Since both of these phenomena seem hard to exhibit in current machine learning algorithms or faithfully represent in a toy problem, unbounded analysis seems likely to be the main way to go. In general, it seems closely related to the Other-izer Problem which also seems most amenable to unbounded analysis at the present time.

Soft op­ti­miza­tion makes the value tar­get bigger

Jeremy Gillen2 Jan 2023 16:06 UTC
119 points
20 comments12 min readLW link

When to use quantilization

RyanCarey5 Feb 2019 17:17 UTC
65 points
5 comments4 min readLW link

Re­quire­ments for a STEM-ca­pa­ble AGI Value Learner (my Case for Less Doom)

RogerDearnaley25 May 2023 9:26 UTC
33 points
3 comments15 min readLW link

Satis­ficers want to be­come maximisers

Stuart_Armstrong21 Oct 2011 16:27 UTC
38 points
70 comments1 min readLW link

[Question] Why don’t quan­tiliz­ers also cut off the up­per end of the dis­tri­bu­tion?

Alex_Altair15 May 2023 1:40 UTC
25 points
2 comments1 min readLW link

Quan­tilal con­trol for finite MDPs

Vanessa Kosoy12 Apr 2018 9:21 UTC
14 points
0 comments13 min readLW link

Quan­tiliz­ers max­i­mize ex­pected util­ity sub­ject to a con­ser­va­tive cost constraint

jessicata28 Sep 2015 2:17 UTC
33 points
3 comments5 min readLW link

Op­ti­miza­tion Reg­u­lariza­tion through Time Penalty

Linda Linsefors1 Jan 2019 13:05 UTC
11 points
4 comments3 min readLW link

Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

abramdemski21 Jul 2018 8:06 UTC
20 points
4 comments4 min readLW link

Thoughts on Quantilizers

Stuart_Armstrong2 Jun 2017 16:24 UTC
2 points
0 comments2 min readLW link

Steam

abramdemski20 Jun 2022 17:38 UTC
153 points
13 comments5 min readLW link1 review

How to safely use an optimizer

Simon Fischer28 Mar 2024 16:11 UTC
47 points
21 comments7 min readLW link

[Aspira­tion-based de­signs] 2. For­mal frame­work, ba­sic algorithm

28 Apr 2024 13:02 UTC
18 points
2 comments16 min readLW link

Black-box in­ter­pretabil­ity method­ol­ogy blueprint: Prob­ing run­away op­ti­mi­sa­tion in LLMs

Roland Pihlakas22 Jun 2025 18:16 UTC
17 points
0 comments7 min readLW link

AISC pro­ject: Satis­fIA – AI that satis­fies with­out over­do­ing it

Jobst Heitzig11 Nov 2023 18:22 UTC
12 points
0 comments1 min readLW link
(docs.google.com)

Think­ing about max­i­miza­tion and corrigibility

James Payor21 Apr 2023 21:22 UTC
63 points
4 comments5 min readLW link

[Aspira­tion-based de­signs] 1. In­for­mal in­tro­duc­tion

28 Apr 2024 13:00 UTC
44 points
4 comments8 min readLW link

Why mod­el­ling multi-ob­jec­tive home­osta­sis is es­sen­tial for AI al­ign­ment (and how it helps with AI safety as well). Subtleties and Open Challenges.

Roland Pihlakas12 Jan 2025 3:37 UTC
47 points
7 comments12 min readLW link

AISC team re­port: Soft-op­ti­miza­tion, Bayes and Goodhart

27 Jun 2023 6:05 UTC
38 points
2 comments15 min readLW link

Aspira­tion-based Q-Learning

27 Oct 2023 14:42 UTC
38 points
5 comments11 min readLW link

Break­ing the Op­ti­mizer’s Curse, and Con­se­quences for Ex­is­ten­tial Risks and Value Learning

Roger Dearnaley21 Feb 2023 9:05 UTC
10 points
1 comment23 min readLW link

Re­ward is not Ne­c­es­sary: How to Create a Com­po­si­tional Self-Pre­serv­ing Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC
17 points
2 comments2 min readLW link
(arxiv.org)

The Op­ti­mizer’s Curse and How to Beat It

lukeprog16 Sep 2011 2:46 UTC
100 points
84 comments3 min readLW link

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

beren20 Feb 2023 21:32 UTC
14 points
1 comment4 min readLW link

Sys­tem­atic run­away-op­ti­miser-like LLM failure modes on Biolog­i­cally and Eco­nom­i­cally al­igned AI safety bench­marks for LLMs with sim­plified ob­ser­va­tion for­mat (BioBlue)

16 Mar 2025 23:23 UTC
45 points
8 comments12 min readLW link

Ex­plor­ing Mild Be­havi­our in Embed­ded Agents

Megan Kinniment27 Jun 2022 18:56 UTC
21 points
4 comments18 min readLW link
No comments.