# Probabilities, weights, sums: pretty much the same for reward functions

This post is a more minor post, that I’m putting up to refer­ence in other posts.

# Prob­a­bil­ities, weights, and expectations

You’re an agent, with po­ten­tial un­cer­tainty over your re­ward func­tion. You know you have to maximise

where and are re­ward func­tions. What do you do?

Well, how do we in­ter­pret the ? Are they prob­a­bil­ities for which re­ward func­tion is right? Or are they weights, tel­ling you the rel­a­tive im­por­tance of each one? Well, in fact:

• If you won’t be learn­ing any more in­for­ma­tion to help you dis­t­in­guish be­tween re­ward func­tions, then weights and prob­a­bil­ities play the same role.

Thus, if you don’t ex­pect to learn any more re­ward func­tion-rele­vant in­for­ma­tion, max­imis­ing re­ward given is the same as max­imis­ing the sin­gle re­ward func­tion .

So, if we de­note prob­a­bil­ities with in bold, max­imis­ing the fol­low­ing (given no re­ward-func­tion learn­ing) are all equiv­a­lent:

Now, given a prob­a­bil­ity dis­tri­bu­tion over re­ward func­tions, we can take its ex­pec­ta­tion . You can define this by talk­ing about af­fine spaces and so on, but the sim­ple ver­sion of it is: to take an ex­pec­ta­tion, rewrite ev­ery prob­a­bil­ity as a weight. So the re­sult be­comes:

• If you won’t be learn­ing any more in­for­ma­tion to help you dis­t­in­guish be­tween re­ward func­tions, then dis­tri­bu­tions with same ex­pec­ta­tion are equiv­a­lent.

# Ex­pected ev­i­dence and unriggability

We’ve defined an un­rig­gable learn­ing pro­cess as one that re­spects con­ser­va­tion of ex­pected ev­i­dence.

Now, con­ser­va­tion of ex­pected ev­i­dence is about ex­pec­ta­tions. It ba­si­cally says that, if and are two poli­cies the agent could take, then for the prob­a­bil­ity dis­tri­bu­tion ,

Sup­pose that is in fact rig­gable, and that we wanted to “cor­rect” it to make it un­rig­gable. Then we would want to add a cor­rec­tion term for any policy . If we took as a “de­fault” policy, we could add a cor­rec­tion term to :

This would have the re­quired un­rig­ga­bil­ity prop­er­ties. But how do you add to a prob­a­bil­ity dis­tri­bu­tion—and how do you sub­tract from it?

Bur re­call that un­rig­ga­bil­ity only cares about ex­pec­ta­tions, and ex­pec­ta­tions treat prob­a­bil­ities as weights. Ad­ding weighted re­ward func­tions is perfectly fine. Gen­er­ally there will be mul­ti­ple ways of do­ing this, mix­ing prob­a­bil­ities and weights.

For ex­am­ple, if and , then we can map to

1. ,

2. ,

3. with ,

4. ,

5. and many other op­tions...

This mul­ti­plic­ity of pos­si­bil­ities is what I was try­ing to deal with in my old post about re­ward func­tion trans­la­tions.

• You know you have to max­imise 0.5R1+0.5R2

Can I have a lit­tle more de­tail on the setup? Is it a fair restate­ment to say: You’re an agent, with a static re­ward func­tion which you do not have di­rect ac­cess to. Omega (God, your cre­ator, some­one in­fal­lable and hon­est) has told you that 0.5R1 + 0.5R2 is re­d­u­ca­ble to your re­ward func­tion, some­how, and you are not ca­pa­ble of ex­per­i­ment­ing or ob­serv­ing any­thing that would dis­am­biguate this.

Now, as an ac­tual per­son, I’d prob­a­bly say “Fuck you, God, I’m run­ning the ex­per­i­ment. I’ll do some­thing that gen­er­ates differ­ent R1 and R2, mea­sure my re­ward, and now I know my weight­ing.”

In the case of an ar­tifi­cially-limited agent, who isn’t per­mit­ted to ac­tu­ally up­date based on ex­pe­rience, you’re right that it doesn’t mat­ter—prob­a­bil­ity _is_ weight for un­cer­tain out­comes. But you have an un­nec­es­sary in­di­rec­tion with “re­spects con­ser­va­tion of ex­pected ev­i­dence. ” You can just say “un­able to up­date this be­lief”.