# Strong implication of preference uncertainty

Here is a the­ory that is just as good as gen­eral rel­a­tivity:

AGR (An­gel Gen­eral Rel­a­tivity): Tiny in­visi­ble an­gels push around all the par­ti­cles in the uni­verse in a way that is in­dis­t­in­guish­able from the equa­tions of gen­eral rel­a­tivity.

This the­ory is falsifi­able, just as gen­eral rel­a­tivity (GR) it­self is. In­deed, since it gives ex­actly the same pre­dic­tions as GR, a Bayesian will never find ev­i­dence that prefers it over Ein­stein’s the­ory.

There­fore, I ob­vi­ously de­serve a No­bel prize for sug­gest­ing it.

# En­ter Oc­cam’s shav­ing equipment

Ob­vi­ously the an­gel the­ory is not a rev­olu­tion­ary new the­ory. Par­tially be­cause I’ve not done any of the hard work, just con­structed a poin­ter to Ein­stein’s the­ory. But, philo­soph­i­cally, the main jus­tifi­ca­tion is Oc­cam’s ra­zor—the sim­plest the­ory is to be preferred.

From a Bayesian per­spec­tive, you could see vi­o­la­tions of Oc­cam’s ra­zor as cheat­ing, us­ing your pos­te­rior as pri­ors. There is a whole class of “an­gels are push­ing par­ti­cles” the­o­ries, and AGR is just a small por­tion of that space. By con­sid­er­ing AGR and GR on equal foot­ing, we’re priv­ileg­ing AGR above what it de­serves[1].

# In physics, Oc­cam’s ra­zor doesn’t mat­ter for strictly iden­ti­cal theories

Oc­cam’s ra­zor has two roles: the first is to dis­t­in­guish be­tween strictly iden­ti­cal the­o­ries; the sec­ond is to dis­t­in­guish be­tween the­o­ries that give the same pre­dic­tion on the data so far, but may differ in the fu­ture.

Here, we fo­cus on the first case: GR and AGR are strictly iden­ti­cal; no data will ever dis­t­in­guish them. In essence, the the­ory that one is right and the other wrong is not falsifi­able.

What that means is that, though AGR may be a pri­ori less likely than GR, the rel­a­tive prob­a­bil­ity be­tween the two the­o­ries will never change: they make the same pre­dic­tions. And also be­cause they make the same pre­dic­tions, that rel­a­tive prob­a­bil­ity is ir­rele­vant in prac­tice: we could use AGR just as well as GR for pre­dic­tions.

# How prefer­ences differ

Now let’s turn to prefer­ences, as de­scribed in our pa­per “Oc­cam’s ra­zor is in­suffi­cient to in­fer the prefer­ences of ir­ra­tional agents”.

Here two sets of prefer­ences are “pre­dic­tion-iden­ti­cal”, in the sense of the physics the­o­ries above, if they pre­dict the same be­havi­our for the agent. So that means that two differ­ent prefer­ence-based ex­pla­na­tions for the same be­havi­our will never change their rel­a­tive prob­a­bil­ities.

Worse than that, Oc­cam’s ra­zor doesn’t solve the is­sue. The sim­plest ex­pla­na­tions of, say, hu­man be­havi­our, is that hu­mans are fully ra­tio­nal at all times. This isn’t the ex­pla­na­tion that we want.

Even worse than that, pre­dic­tion-iden­ti­cal prefer­ences will lead to vastly differ­ent con­se­quences if pro­gram an AI to max­imise them.

So, in sum­mary:

1. Pre­dic­tion-iden­ti­cal prefer­ences never change rel­a­tive prob­a­bil­ity.

2. The sim­plest pre­dic­tion-iden­ti­cal prefer­ences are known to be wrong for hu­mans.

3. It could be very im­por­tant for the fu­ture to get the right prefer­ence for hu­mans.

1. GR would make up a larger por­tion of , “ge­o­met­ric the­o­ries of space-time” than AGR makes up of , and would be more likely than any­way, es­pe­cially af­ter up­dat­ing on the non-ob­ser­va­tion of an­gels. ↩︎

• And also be­cause they make the same pre­dic­tions, that rel­a­tive prob­a­bil­ity is ir­rele­vant in prac­tice: we could use AGR just as well as GR for pre­dic­tions.

There is a sub­tle sense in which the differ­ence be­tween AGR and GR is rele­vant. While the differ­ence doesn’t change the pre­dic­tions, it may change the util­ity func­tion. An agent that cares about an­gels (if they ex­ist) might do differ­ent things if it be­lieves it­self to be in AGR world than in GR world. As the the­o­ries make iden­ti­cal pre­dic­tions, the agents be­lief only de­pends on its pri­ors (and any ir­ra­tional­ity), not on which world it is in. Nonethe­less, this means that the agent will pay to avoid hav­ing its pri­ors mod­ified. Even though the mod­ifi­ca­tion doesn’t change the agents pre­dic­tions in the slight­est.

• Can you give an ex­am­ple of two sets of prefer­ences which are pre­dic­tion-iden­ti­cal, but which lead to will lead to “vastly differ­ent con­se­quences if [you] pro­gram an AI to max­imi[z]e them”?

• The most ba­sic ex­am­ples are com­par­i­sons be­tween de­rived prefer­ences that as­sume the hu­man is always ra­tio­nal (i.e. ev­ery ac­tion they take, no mat­ter how mis­taken it may ap­pear, is in the ser­vice of some com­pli­cated plan for how the uni­verse’s his­tory should go. My friend get­ting drunk and knock­ing over his friend’s dresser was all planned and to­tally in ac­cor­dance with their prefer­ences.), and de­rived prefer­ences that as­sume the hu­man is ir­ra­tional in some way (e.g. maybe they would pre­fer not to drink so much coffee, but can’t wake up with­out it, and so the ac­tion that best fulfills their prefer­ences is to help them drink less coffee).

But more in­tu­itive ex­am­ples might in­volve com­par­i­son be­tween two differ­ent sorts of hu­man ir­ra­tional­ity.

For ex­am­ple, in the case of coffee, the AI is sup­posed to learn that the hu­man has some pat­tern of thoughts and in­cli­na­tions that mean it ac­tu­ally doesn’t want coffee, and its ac­tions of drink­ing coffee are due to some sort of limi­ta­tion or mis­take.

But con­sider a differ­ent mis­take: not do­ing heroin. After all, upon try­ing heroin, the hu­man would be happy and would ex­hibit be­hav­ior con­sis­tent with want­ing heroin. So we might imag­ine an AI that in­fers that hu­mans want heroin, and that their cur­rent ac­tions of not try­ing heroin are due to some sort of mis­take.

Both the­o­ries can be pre­dic­tion-iden­ti­cal—the two differ­ent sets of “real prefer­ences” just need to be filtered through two differ­ent mod­els of hu­man ir­ra­tional­ity. Depend­ing on what you clas­sify as “ir­ra­tional,” this de­gree of free­dom trans­lates into a change in what you con­sider “the real prefer­ences.”