Probabilities, weights, sums: pretty much the same for reward functions

This post is a more minor post, that I’m putting up to refer­ence in other posts.

Prob­a­bil­ities, weights, and expectations

You’re an agent, with po­ten­tial un­cer­tainty over your re­ward func­tion. You know you have to maximise

where and are re­ward func­tions. What do you do?

Well, how do we in­ter­pret the ? Are they prob­a­bil­ities for which re­ward func­tion is right? Or are they weights, tel­ling you the rel­a­tive im­por­tance of each one? Well, in fact:

  • If you won’t be learn­ing any more in­for­ma­tion to help you dis­t­in­guish be­tween re­ward func­tions, then weights and prob­a­bil­ities play the same role.

Thus, if you don’t ex­pect to learn any more re­ward func­tion-rele­vant in­for­ma­tion, max­imis­ing re­ward given is the same as max­imis­ing the sin­gle re­ward func­tion .

So, if we de­note prob­a­bil­ities with in bold, max­imis­ing the fol­low­ing (given no re­ward-func­tion learn­ing) are all equiv­a­lent:

Now, given a prob­a­bil­ity dis­tri­bu­tion over re­ward func­tions, we can take its ex­pec­ta­tion . You can define this by talk­ing about af­fine spaces and so on, but the sim­ple ver­sion of it is: to take an ex­pec­ta­tion, rewrite ev­ery prob­a­bil­ity as a weight. So the re­sult be­comes:

  • If you won’t be learn­ing any more in­for­ma­tion to help you dis­t­in­guish be­tween re­ward func­tions, then dis­tri­bu­tions with same ex­pec­ta­tion are equiv­a­lent.

Ex­pected ev­i­dence and unriggability

We’ve defined an un­rig­gable learn­ing pro­cess as one that re­spects con­ser­va­tion of ex­pected ev­i­dence.

Now, con­ser­va­tion of ex­pected ev­i­dence is about ex­pec­ta­tions. It ba­si­cally says that, if and are two poli­cies the agent could take, then for the prob­a­bil­ity dis­tri­bu­tion ,

Sup­pose that is in fact rig­gable, and that we wanted to “cor­rect” it to make it un­rig­gable. Then we would want to add a cor­rec­tion term for any policy . If we took as a “de­fault” policy, we could add a cor­rec­tion term to :

This would have the re­quired un­rig­ga­bil­ity prop­er­ties. But how do you add to a prob­a­bil­ity dis­tri­bu­tion—and how do you sub­tract from it?

Bur re­call that un­rig­ga­bil­ity only cares about ex­pec­ta­tions, and ex­pec­ta­tions treat prob­a­bil­ities as weights. Ad­ding weighted re­ward func­tions is perfectly fine. Gen­er­ally there will be mul­ti­ple ways of do­ing this, mix­ing prob­a­bil­ities and weights.

For ex­am­ple, if and , then we can map to

  1. ,

  2. ,

  3. with ,

  4. ,

  5. and many other op­tions...

This mul­ti­plic­ity of pos­si­bil­ities is what I was try­ing to deal with in my old post about re­ward func­tion trans­la­tions.