Bayesian Evolving-to-Extinction

The pre­sent dis­cus­sion owes a lot to Scott Garrabrant and Evan Hub­inger.

In Defin­ing My­opia, I for­mal­ized tem­po­ral or cross-in­stance my­opia /​ non-my­opia, but I claimed that there should also be some kind of sin­gle-in­stance my­opia which I hadn’t prop­erly cap­tured. I also sug­gested this in Pre­dict-O-Matic.

This post is in­tended to be an ex­am­ple of sin­gle-in­stance par­tial agency.

Evolv­ing to Extinction

Evolu­tion might be my­opic in a num­ber of ways, but one way is that it’s my­opic across in­di­vi­d­u­als—it typ­i­cally pro­duces re­sults very differ­ent from what group se­lec­tion would pro­duce, be­cause it’s closer to op­ti­miz­ing rel­a­tive fit­ness of in­di­vi­d­u­als (rel­a­tive to each other) than it is to op­ti­miz­ing over­all fit­ness. Adap­ta­tions which help mem­bers of a species com­pete with each other are a great ex­am­ple of this. Why in­crease your own fit­ness, when you can just de­crease some­one else’s in­stead? We’re lucky that it’s typ­i­cally pretty hard, at least his­tor­i­cally, to do things which are bad across the board but slightly less bad for the one do­ing them. Imag­ine a “toxic gas gene” which makes the air harder for ev­ery­one to breathe, but slightly less so for car­ri­ers of the gene. Such a gene would be se­lected for. This kind of thing can be se­lected for even to the point where it drives the pop­u­la­tion of a species right down to zero, as Eliezer’s es­say on evolv­ing to ex­tinc­tion high­lighted.

Ac­tu­ally, as Eliezer’s es­say em­pha­sized, it’s not even that evolu­tion is my­opic at the level of in­di­vi­d­u­als; evolu­tion is my­opic down to the level of in­di­vi­d­ual genes, an ob­ser­va­tion which bet­ter ex­plains the ex­am­ples of evolv­ing-to-ex­tinc­tion which he dis­cusses. (This is, of course, the point of Dawk­ins’ book The Selfish Gene.) But the anal­ogy of my­opia-across-in­di­vi­d­u­als will suit me bet­ter here.

Bayes “Evolv­ing to Ex­tinc­tion”

The ti­tle of this post is a hy­per­bole, since there isn’t an ana­log of an ex­tinc­tion event in the model I’m about to de­scribe, but it illus­trates that in ex­treme cir­cum­stances a Bayesian learner can demon­strate the same kind of patholog­i­cal be­hav­ior that evolu­tion does when it ends up se­lect­ing for rel­a­tive fit­ness in a way which pumps against ab­solute fit­ness.

Like evolu­tion, Bayes’ Law will “op­ti­mize”[1] for rel­a­tive fit­ness of hy­pothe­ses, not ab­solute fit­ness. Or­di­nar­ily there isn’t enough of a differ­ence for this to mat­ter. How­ever, I’ve been dis­cussing sce­nar­ios where the pre­dic­tor can sig­nifi­cantly in­fluence what’s be­ing pre­dicted. Bayes’ Law was not for­mu­lated with ex­am­ples like this in mind, and we can get patholog­i­cal be­hav­ior as a re­sult.

One way to con­struct an ex­am­ple is to imag­ine that there is a side-chan­nel by which hy­pothe­ses can in­fluence the world. The “offi­cial” chan­nel is to out­put pre­dic­tions; but let’s say the sys­tem also pro­duces di­ag­nos­tic logs which pre­dic­tors can write to, and which hu­mans read. A pre­dic­tor can (for ex­am­ple) print stock tips into the di­ag­nos­tic logs, to get some re­ac­tion from hu­mans.

Say we have a Bayesian pre­dic­tor, con­sist­ing of some large but fixed num­ber of hy­pothe­ses. An in­di­vi­d­ual hy­poth­e­sis “wants” to score well rel­a­tive to oth­ers. Let’s also say, for the sake of ar­gu­ment, that all hy­pothe­ses have the abil­ity to write to di­ag­nos­tic logs, but hu­mans are more likely to pay at­ten­tion to the di­ag­nos­tics for more prob­a­ble hy­pothe­ses.

How should a hy­poth­e­sis make use of this side-chan­nel? It may ini­tially seem like it should use it to make the world more pre­dictable, so that it can make more ac­cu­rate pre­dic­tions and thus get a bet­ter score. How­ever, this would make a lot of hy­pothe­ses score bet­ter, not just the one print­ing the ma­nipu­la­tive mes­sage. So it wouldn’t re­ally be se­lected for.

In­stead, a hy­poth­e­sis could print ma­nipu­la­tive mes­sages de­signed to get hu­mans to do things which no other hy­poth­e­sis an­ti­ci­pates. This in­volves speci­fi­cally op­ti­miz­ing for events with low prob­a­bil­ity to hap­pen. Hy­pothe­ses which suc­cess­fully ac­com­plish this will get a large boost in rel­a­tive pre­dic­tive ac­cu­racy, mak­ing them more prob­a­ble ac­cord­ing to Bayes’ Law.

So, a sys­tem in this kind of situ­a­tion even­tu­ally winds up be­ing dom­i­nated by hy­pothe­ses which ma­nipu­late events to be as un­pre­dictable as pos­si­ble (by that very sys­tem), sub­ject to the con­straint that one hy­poth­e­sis or an­other within the sys­tem can pre­dict them.

This is very much like what I called the en­tropy-mar­ket prob­lem for futarchy, also known as the as­sas­i­na­tion-mar­ket prob­lem. (Any pre­dic­tion mar­ket in­volv­ing the lifes­pan of pub­lic figures is equiv­a­lent to an as­sas­si­na­tion mar­ket; it pays for the death of pub­lic figures, since that is a hard-to-pre­dict but eas­ier-to-con­trol event.)

Analo­gous prob­lems arise if there is no side-chan­nel but the pre­dic­tion it­self can in­fluence events (which seems very plau­si­ble for re­al­is­tic pre­dic­tions).

Is This My­opia?

If we use “my­opia” to point to the kind of non-strate­gic be­hav­ior we might ac­tu­ally want out of a purely pre­dic­tive sys­tem, this isn’t my­opia at all. For this rea­son, and for other rea­sons, I’m more com­fortable throw­ing this un­der the um­brella term “par­tial agency”. How­ever, I think it’s im­por­tantly re­lated to my­opia.

  • Just like we can think of evolu­tion as my­opi­cally op­ti­miz­ing per-in­di­vi­d­ual, un­car­ing of over­all harm to re­pro­duc­tive fit­ness if that harm went along with im­prove­ments to in­di­vi­d­ual rel­a­tive fit­ness, we can think of Bayes’ Law as my­opi­cally op­ti­miz­ing per-hy­poth­e­sis, un­car­ing of over­all harm to pre­dic­tive ac­cu­racy.

  • The phe­nomenon here doesn’t illus­trate the “true my­opia” we would want of a purely pre­dic­tive sys­tem, since it ends up ma­nipu­lat­ing events. How­ever, it at least shows that there are al­ter­na­tives. One might have ar­gued “sure, I get the idea of cross-in­stance my­opia, show­ing that per-in­stance op­ti­miza­tion is (pos­si­bly rad­i­cally) differ­ent from cross-in­stance op­ti­miza­tion. But how could there be per-in­stance my­opia, as dis­tinct from per-in­stance op­ti­miza­tion? How can par­tial agency get any more par­tial than my­opi­cally op­ti­miz­ing in­di­vi­d­ual in­stances?” Bayes-evolv­ing-to-ex­tinc­tion clearly shows that we can break things down fur­ther. So per­haps there’s still room for a fur­ther “true my­opia” which cod­ifies non-ma­nipu­la­tion even for sin­gle in­stances.

  • This phe­nomenon also con­tinues the game-the­o­retic theme. Just as we can think of per-in­stance my­opia as stop­ping cross-in­stance op­ti­miza­tion by way of a Molochian race-to-the-bot­tom, we see the same thing here.

Neu­ral Nets /​ Gra­di­ent Descent

As I’ve men­tioned be­fore, there is a po­ten­tially big differ­ence be­tween multi-hy­poth­e­sis se­tups like Bayes and sin­gle-hy­poth­e­sis se­tups like gra­di­ent-de­scent learn­ing. Some of my ar­gu­ments, like the one above, in­volve hy­pothe­ses com­pet­ing with each other to reach Molochian out­comes. We need to be care­ful in re­lat­ing this to cases like gra­di­ent de­scent learn­ing, which might ap­prox­i­mate Bayesian learn­ing in some sense, but in­cre­men­tally mod­ifies a sin­gle hy­poth­e­sis rather than let­ting many hy­pothe­ses com­pete.

One in­tu­ition is that stochas­tic gra­di­ent de­scent will move the net­work weights around, so that we are in effect sam­pling many hy­pothe­ses within some re­gion. Un­der some cir­cum­stances, the most suc­cess­ful weight set­tings could be the ones which ma­nipu­late things to max­i­mize lo­cal gra­di­ents in their gen­eral di­rec­tion, which means pun­ish­ing other nearby weight con­figu­ra­tions—this could in­volve in­creas­ing the loss, much like the Bayesian case. (See Gra­di­ent Hack­ing.)

There is also the “lot­tery ticket hy­poth­e­sis” to con­sider (dis­cussed on LW here and here) -- the idea that a big neu­ral net­work func­tions pri­mar­ily like a bag of hy­pothe­ses, not like one hy­poth­e­sis which gets adapted to­ward the right thing. We can imag­ine differ­ent parts of the net­work fight­ing for con­trol, much like the Bayesian hy­pothe­ses.

More for­mally, though, we can point to some things which are mod­er­ately analo­gous, but not perfectly.

If we are adapt­ing a neu­ral net­work us­ing gra­di­ent de­scent, but there is a side-chan­nel which we are not ac­count­ing for in our credit as­sign­ment, then the gra­di­ent de­scent will not op­ti­mize the side-chan­nel. This might re­sult in aim­less thrash­ing be­hav­ior.

For ex­am­ple, sup­pose that loss ex­plic­itly de­pends only on the out­put X of a neu­ral net (IE, the gra­di­ent calcu­la­tion is a gra­di­ent on the out­put). How­ever, ac­tu­ally the loss de­pends on an in­ter­nal node Y, in the fol­low­ing way:

  • When |X-Y| is high, the loss func­tion re­wards X be­ing high.

  • When |X-Y| is low, the loss func­tion re­wards X be­ing low.

  • When X is high, the loss func­tion re­wards low |X-Y|.

  • When X is low, the loss func­tion re­wards high |X-Y|.

  • When both val­ues are mid­dling, the loss func­tion in­cen­tivizes X to be less mid­dling.

This can spin around for­ever. It is of course an ex­tremely ar­tifi­cial ex­am­ple, but the point is to demon­strate that when gra­di­ent de­scent does not rec­og­nize all the ways the net­work in­fluences the re­sult, we don’t nec­es­sar­ily see be­hav­ior which “tries to re­duce loss”, or even ap­pears to op­ti­mize any­thing.


  1. The whole point of the par­tial agency se­quence is that words like “op­ti­mize” are wor­ry­ingly am­bigu­ous, but I don’t have suffi­ciently im­proved ter­minol­ogy yet that I feel I can just go ahead and use it while main­tain­ing clar­ity!! In par­tic­u­lar, the sense in which Bayesian up­dates op­ti­mize for any­thing is pretty un­clear when you think about it, yet there is cer­tainly a big temp­ta­tion to say that they op­ti­mize for pre­dic­tive ac­cu­racy (in the log-loss sense). ↩︎