# Beyond algorithmic equivalence: algorithmic noise

There is a ‘no-free-lunch’ the­o­rem in value learn­ing; with­out as­sum­ing any­thing about an agent’s ra­tio­nal­ity, you can’t de­duce any­thing about its re­ward, and vice versa.

Here I’ll in­ves­ti­gate whether you can de­duce more if you start look­ing into the struc­ture of the al­gorithm.

## Al­gorithm (in)equivalence

To do this, we’ll be vi­o­lat­ing the prin­ci­ple of al­gorith­mic equiv­alence: that two al­gorithms with the same in­put-out­put maps should be con­sid­ered the same al­gorithm. Here we’ll in­stead be look­ing in­side the al­gorithm, imag­in­ing that we have ei­ther the code, a box di­a­gram, an FMRI scan of a brain, or some­thing analo­gous.

To illus­trate the idea, I’ll con­sider a very sim­ple model of the an­chor­ing bias. An agent (the “Hu­man”) is given an ob­ject (in the origi­nal ex­per­i­ment, this could be wine, book, choco­lates, key­board, or track­ball), an ran­dom in­te­ger , and is asked to out­put how much they would pay for it.

They will out­put , for some val­u­a­tion sub­rou­tine that is in­de­pen­dent of . This gives a quar­ter weight to the an­chor .

As­sume that tracks three facts about : the per­son’s need for , the emo­tional valence the per­son feels at see­ing it, and a com­par­i­son with ob­jects with similar fea­tures. Call these three sub­rou­tines Need, Emo, and Sim. For sim­plic­ity, we’ll as­sume each sub­rou­tine out­puts a sin­gle num­ber, that then gets av­er­aged.

Now con­sider four mod­els of as fol­lows, with ar­rows show­ing the in­put-out­put flows:

I’d ar­gue that a) and b) im­ply that the an­chor­ing bias is a bias, c) is neu­tral, and d) im­plies (at least weakly) that the an­chor­ing bias is not a bias.

How so? In a) and b), maps straight into Sim and Need. Since is ran­dom, it has no bear­ing on how much is needed, and on how valuable similar ob­jects are. There­fore, it makes sense to see its con­tri­bu­tion as noise or er­ror.

In d), on the other hand, it is su­perfi­cially plau­si­ble that a re­cently heard ran­dom in­put could have some emo­tional effect (if was not a num­ber but a scream, we’d ex­pect it to have an emo­tional im­pact). So if we wanted to ar­gue that, ac­tu­ally, the an­chor­ing bias is not a bias but that peo­ple ac­tu­ally de­rive plea­sure from out­putting num­bers that are close to num­bers they heard re­cently, then go­ing into Emo would be the right place for it to go. Setup c) is not in­for­ma­tive ei­ther way.

## Symbols

There’s some­thing very GOFAI about the setup above, with la­bel­led nodes with definite func­tion­al­ity. You cer­tainly wouldn’t want the con­clu­sions to change if, for in­stance, I ex­changed the la­bels of Emo and Sim!

What I’m imag­in­ing here is that a struc­tural anal­y­sis of finds this de­com­po­si­tion as a nat­u­ral one, and then the la­bels and func­tion­al­ity of the differ­ent mod­ules are es­tab­lished by see­ing what they do in other cir­cum­stances (“Sim always ac­cesses mem­o­ries of similar ob­jects...”).

Peo­ple have di­vided parts of the brain into func­tional mod­ules, so this is not a com­pletely vac­u­ous ap­proach. In­deed, it most re­sem­bles “sym­bol ground­ing” in re­verse: we know the mean­ing of the var­i­ous ob­jects in the world, we know what does, and we want to find the cor­re­spond­ing sym­bols within it.

## Nor­ma­tive assumptions

The no-free-lunch re­sult still ap­plies in this set­ting; all that’s hap­pen is that we’ve re­placed the set of plan­ners (which were maps from re­ward func­tions to poli­cies), with the set of al­gorithms (that map re­ward func­tions to poli­cies). In­deed is just a set of equiv­alence classes in , with equiv­alence be­tween al­gorithms defined by al­gorith­mic equiv­alence, and the no-free-lunch re­sults still ap­ply.

The above ap­proach does not ab­solve us from the ne­ces­sity of mak­ing nor­ma­tive as­sump­tions. But hope­fully these will be rel­a­tively light ones. To make this fully rigor­ous, we can come up with a defi­ni­tion which de­com­poses any al­gorithm into mod­ules, iden­ti­fies noise such as in Sim and Need, and then trims that out (by which we mean, iden­ti­fies noise with the plan­ner, not the re­ward).

It’s still philo­soph­i­cally un­satis­fac­tory, though—what are the prin­ci­pled rea­sons for do­ing so, apart from the fact that it gives the right an­swer in this one case? See my next post, where we ex­plore a bit more of what can be done with the in­ter­nal struc­ture of al­gorithms: the al­gorithm will start to model it­self.

No nominations.
No reviews.
• It seems like in prac­tice this kind of in­ter­nal al­gorith­mic in­equiv­alence is always de­tectable. That is, you could always figure out for a given hu­man just by feed­ing the black box differ­ent in­puts and out­puts which of the four pos­si­bil­ities is oc­cur­ring, and that they would di­verge in mean­ingful be­hav­ioral ways in the ap­pro­pri­ate cir­cum­stances.

It also seems like the rea­son I have in­tu­itions that those four cases are “differ­ent” is ac­tu­ally be­cause I ex­pect them to re­sult in differ­ent out­side-black-box be­hav­iors when you vary in­puts. Is there a con­crete ex­am­ple where in­ter­nal struc­tural differ­ence can­not be de­tected out­side the box? It’s not ob­vi­ous to me that I would care about such a differ­ence.

• It’s de­tectable be­cause the al­gorithms are clean and sim­ple as laid out here. Make it a bit more messy, add a few al­most-ir­rele­vant cross con­nec­tions, and it be­comes a lot harder.

In the­ory, of course, you could run an en­tire world self-con­tained in­side an al­gorithm, and al­gorith­mic equiv­alence would ar­gue that it is there­fore ir­rele­vant.

And in prac­tice, what I’m aiming for is use “hu­man be­havi­our + brain struc­ture + FMRI out­puts” to get more than just “hu­man be­havi­our”. It might be that those are equiv­a­lent in the limit of a su­per AI that can analy­ses ev­ery coun­ter­fac­tual uni­verse, yet differ­ent in prac­tice for real AIs.

• Is this any differ­ent from just say­ing that ra­tio­nal­ity model is the en­tire graph (the agent will max­i­mize H), and the true util­ity func­tion is Emo?

• ?

No, that doesn’t seem right. The way I set it up, Emo cor­re­sponds to short term feel­ings.