Thoughts and problems with Eliezer’s measure of optimization power

Back in the day, Eliezer pro­posed a method for mea­sur­ing the op­ti­miza­tion power (OP) of a sys­tem S. The idea is to get a mea­sure of small a tar­get the sys­tem can hit:

You can quan­tify this, at least in the­ory, sup­pos­ing you have (A) the agent or op­ti­miza­tion pro­cess’s prefer­ence or­der­ing, and (B) a mea­sure of the space of out­comes—which, for dis­crete out­comes in a finite space of pos­si­bil­ities, could just con­sist of count­ing them—then you can quan­tify how small a tar­get is be­ing hit, within how large a greater re­gion.

Then we count the to­tal num­ber of states with equal or greater rank in the prefer­ence or­der­ing to the out­come achieved, or in­te­grate over the mea­sure of states with equal or greater rank. Di­vid­ing this by the to­tal size of the space gives you the rel­a­tive smal­l­ness of the tar­get—did you hit an out­come that was one in a mil­lion? One in a trillion?

Ac­tu­ally, most op­ti­miza­tion pro­cesses pro­duce “sur­prises” that are ex­po­nen­tially more im­prob­a­ble than this—you’d need to try far more than a trillion ran­dom re­order­ings of the let­ters in a book, to pro­duce a play of qual­ity equal­ling or ex­ceed­ing Shake­speare. So we take the log base two of the re­cip­ro­cal of the im­prob­a­bil­ity, and that gives us op­ti­miza­tion power in bits.

For ex­am­ple, as­sume there were eight equally likely pos­si­ble states {X0, X1, … , X7}, and S gives them util­ities {0, 1, … , 7}. Then if S can make X6 hap­pen, there are two states bet­ter or equal to its achieve­ment (X6 and X7), hence it has hit a tar­get filling 14 of the to­tal space. Hence its OP is log2 4 = 2. If the best S could man­age is X4, then it has only hit half the to­tal space, and has an OP of only log2 2 = 1. Con­versely, if S reached the perfect X7, 18 of the to­tal space, then it would have an OP of log2 8 = 3.

The sys­tem, the whole sys­tem, and ev­ery­thing else in the universe

No­tice that OP is defined in terms of the state that S achieved (for the mo­ment this will be a pure world, but later we’ll al­low prob­a­bil­is­ti­cally mixed wor­lds to be S’s “achieve­ment”). So it give us a mea­sure of how pow­er­ful S is in prac­tice in our model, not some pla­tonic mea­sure of how good S is in gen­eral situ­a­tions. So an idiot king has more OP than a brilli­ant peas­ant; a naive search al­gorithm dis­tributed across the in­ter­net has more OP than a much bet­ter pro­gram run­ning on Colos­sus. This does not seem a draw­back to OP: af­ter all, we want to mea­sure how pow­er­ful a sys­tem ac­tu­ally is, not how pow­er­ful it could be in other cir­cum­stances.

Similarly, OP mea­sures the sys­tem’s abil­ity to achieve its very top goals, not how hard these goals are. A sys­tem that wants to com­pose a brilli­ant son­net has more OP than ex­actly the same sys­tem that wants to com­pose a brilli­ant son­net while em­bod­ied in the An­dromeda galaxy. Even though the sec­ond is plau­si­bly more dan­ger­ous. So OP is a very im­perfect mea­sure of how pow­er­ful a sys­tem is.

We could maybe ex­tend this to some sort of “op­posed OP”: what is the op­ti­miza­tion power of S, given that hu­mans want to stop it from achiev­ing its goals? But even there, a highly pow­er­ful sys­tem with nearly un-achiev­able goals will still have a very low op­posed OP. Maybe the differ­ence be­tween the op­posed OP and the stan­dard OP is a bet­ter mea­sure of power.

As pointed out by Tim Tyler, OP can also in­crease if we change the size of the solu­tion space. Imag­ine an agent that has to print out a non-nega­tive in­te­ger N, and whose util­ity is -N. The agent will ob­vi­ously print 0, but if the printer is limited to ten digit num­bers, its OP is smaller than if the printer is limited to twenty digit num­bers: though the solu­tion is just as easy and ob­vi­ous, the num­ber of ways it “could have been worse” is in­creased, in­creas­ing OP.

Is it OP an en­tropy? Is it defined for mixed states?

In his post Eliezer makes a com­par­i­son be­tween OP and en­tropy. And OP does have some of the prop­er­ties of en­tropy: for in­stance if S is op­ti­miz­ing two sep­a­rate in­de­pen­dent pro­cesses (and its own util­ity treats them as in­de­pen­dent), then its OP is the sum of the OP for each pro­cess. If for in­stance S hit an area of 14 in the first pro­cess (OP 2) and 18 in the sec­ond (OP 3), then it hits an area of 1/​(4*8)=1/​32 for the joint pro­cesses, for an OP of 5. This prop­erty, in­ci­den­tally, is what al­lows us to talk about “the” en­tropy of an iso­lated sys­tem, with­out wor­ry­ing about the rest of the uni­verse.

But now imag­ine that our S in the first ex­am­ple can’t be sure to hit a pure state, but has 50% chance of hit­ting X7 and 50% of hit­ting X4. If OP were an en­tropy, then we’d sim­ply do a weighted sum 1/​2(OP(X4)+OP(X7))=1/​2(1+3)=2, and then add one ex­tra bit of en­tropy to rep­re­sent our (bi­nary) un­cer­tainty as to what state we were in, giv­ing a to­tal OP of 3. But this is the same OP as X7 it­self! And ob­vi­ously a 50% of X7 and 50% of some­thing in­fe­rior can­not be as good as a cer­tainty of the best pos­si­ble state. So un­like en­tropy, mere un­cer­tainty can­not in­crease OP.

So how should OP ex­tend to mixed states? Can we write a sim­ple dis­tribu­tive law:

OP(1/​2 X4 + 12 X7) = 1/​2(OP(X4) + OP(X7)) = 2?

It turns out we can’t. Imag­ine that, with­out chang­ing any­thing else, the util­ity of X7 is sud­denly set to ten trillion, rather than 7. The OP of X7 is still 3 - it’s still the best op­tion, still with prob­a­bil­ity 18. And yet 12 X4 + 12 X7 is now ob­vi­ously much, much bet­ter than X6, which has an OP of 2. But now let’s re­set X6 to be­ing ten trillion minus 1. Then it still has a OP of 2, and yet is now much much bet­ter than 12 X4 + 12 X7.

But I may have been un­fair in those ex­am­ples. After all, we’re look­ing at mixed states, and X6 need not have a fixed OP of 2 in the space of mixed states. Maybe if we looked at the sim­plex formed by all mixed states made up of {X0, X1, … , X7}, we could get these re­sults to work? Since all Xi are equally likely, we’d sim­ply put a uniform mea­sure on that sim­plex. But now we run into an­other prob­lem: the OP of X7 has sud­denly shot up to in­finity! After all, X7 is now an event of prob­a­bil­ity zero, bet­ter than any other out­come; the log2 of the in­verse of its prob­a­bil­ity is in­finity. Even if we just re­strict to a tiny, non-zero area, around X7, we get ar­bi­trar­ily high OP—it’s not a fluke or a calcu­la­tion er­ror. Which means that if we fol­lowed the dis­tribu­tive law, Q=(1-10-1000) X0 + 10-1000 X7 must have a much larger OP than X6 - de­spite the fact that nearly ev­ery pos­si­ble out­come is bet­ter than Q.

So it seems that un­like en­tropy, OP can­not have any­thing re­sem­bling a dis­tribu­tive law. The set of pos­si­ble out­comes that you started with—in­clud­ing any pos­si­ble mixed out­comes that S could cause—is what you’re go­ing to have to use. This sits un­com­fortably with the whole Bayesian philos­o­phy—af­ter all, there mixed states shouldn’t rep­re­sent any­thing but un­cer­tainty be­tween pure states. They shouldn’t be listed as sep­a­rate out­comes.

Mea­sures and coarse-graining

In the pre­vi­ous sec­tion, we moved from us­ing a finite set of equally likely out­comes, to a mea­sure over a sim­plex of mixed out­comes. This is the nat­u­ral gen­er­al­i­sa­tion of OP: sim­ply com­pute the prob­a­bil­ity mea­sure of the states bet­ter than what S achieves, and use the log2 of the in­verse of this mea­sure as OP.

Some of you may have spot­ted the mas­sive elephant in the room, whose mass twists space and un­der­lines and un­der­mines the defi­ni­tion of OP. What does this prob­a­bil­ity mea­sure ac­tu­ally rep­re­sent? Eliezer saw it in his origi­nal post:

The quan­tity we’re mea­sur­ing tells us how im­prob­a­ble this event is, in the ab­sence of op­ti­miza­tion, rel­a­tive to some prior mea­sure that de­scribes the un­op­ti­mized prob­a­bil­ities.

Or how could I write “there were eight equally likely pos­si­ble states” and “S can make X6 hap­pen”? Well, ob­vi­ously, what I meant was that if S didn’t ex­ist, then it would be equally likely that X7 and X6 and X5 and X4 and...

But wait! Th­ese Xi’s are fi­nal states of the world—so they in­clude the in­for­ma­tion as to whether S ex­isted in them or not. So what I’m ac­tu­ally say­ing is that {X0(¬S), X1(¬S), … , X7(¬S)} (the wor­lds with no S) are equally likely, whereas Xi(S) (the wor­lds with S) are im­pos­si­ble for i≠6. But what has al­lowed me to iden­tify X0(¬S) with X0(S)? I’m claiming they’re the same world “apart from S” but what does this mean? After all, S can have huge im­pacts, and X0(S) is ac­tu­ally an im­pos­si­ble world! So I’m say­ing that “there two wor­lds are strictly the same, apart that S ex­ists in one of them, but them again, S would never al­low that world to hap­pen if it did ex­ist, so, hum...”

Thus it seems that we need to use some sort of coarse-grain­ing to iden­tify Xi(¬S) with Xi(S), similar to those I spec­u­lated on in the re­duced im­pact post.