jessicata comments on Utility vs Probability: idea synthesis

jessicata 31 Mar 2015 2:55 UTC
LW: 2 AF: 1
0
AF
I was actually going to write something like this up, but you beat me to it! My idea was pretty similar. The main difference is that in my setting, a utility function only cares about outcomes (rather than intermediate events such as $X$ ). Here’s most of what I had written so far (feel free to just skim it, given that it’s mostly the same thing):

I am interested in the question of when we can substitute changes in beliefs for changes in utility function and vice versa.

Here is a simple setting of utility maximization. There are $n$ possible worlds, 2 actions, and $m$ outcomes. An agent has some prior $p$ over possible worlds, and some utility function $u$ over outcomes. The agent can be exposed to different independent experiments. In an experiment, the agent does not know the possible world $ω$ , but indirectly observes $ω$ and thereby sees an observation $O$ with probability $P (O | ω)$ . Then the agent chooses action $A \in {0, 1}$ . Subsequently, some outcome $R$ will occur (stochastically) based on the possible world and action, and the agent values this outcome according to $u$ .

Let $M$ be the “transition difference matrix” defined as $M_{i j} = P (R = j | ω = i, A = 1) - P (R = j | ω = i, A = 0)$ It is not unreasonable to believe that $M$ can be determined a priori: if possible worlds are (say) probabilistic Turing machines accepting an action as input and returning an outcome as output, then indeed $M$ is known a priori.

Let $x * y$ stand for the elementwise product between vectors $x$ and $y$ , and let $x \div y$ stand for the elementwise quotient. Also, define the likelihood $l (o)$ to be a vector with $l_{i} (o) = P (ω = i | O = o)$ . The agent’s posterior distribution over the possible world after seeing evidence $L$ will then be proportional to $p * l (o)$ , using Bayes’ rule. We can compute the expected utility difference between actions in an experiment as $E [u (R) | O = o, A = 1] - E [u (R) | O = o, A = 0] \propto (p * l (o))^{T} M u$

The agent’s behavior across experiments is entirely determined by the function $l \mapsto (p * l)^{T} M u$ . Let us rewrite this as: $(p * l)^{T} M u = (diag (p) l)^{T} M u = l^{T} diag (p) M u = l^{T} (p * M u) = (p * M u)^{T} l$ So in fact, the behavior is determined by the vector $p * M u$ . This vector has $n$ entries, and entry $i$ is equal to the prior probability of possible world $i$ times the expected utility difference (between actions 1 and 0) if we are in possible world $i$ . We could call this vector $p * M u$ the policy of $(p, u)$ .

Now we can ask two interesting questions:
1. For what $(p, u, M, p^{'})$ does there exist $u^{'}$ such that the policy of $(p, u)$ is the policy of $(p^{'}, u^{'})$ ? That is, can we replace changes in beliefs with changes in utility function?
2. For what $(p, u, M, u^{'})$ does there exist $p^{'}$ such that the policy of $(p, u)$ is the policy of $(p^{'}, u^{'})$ ? That is, can we replace changes in utility function with changes in beliefs?
For question 1: $p^{'} * M u^{'} = p * M u \Leftrightarrow diag (p^{'}) M u^{'} = diag (p) M u \Leftrightarrow M u^{'} = diag (p^{'})^{- 1} diag (p) M u \Leftrightarrow M u^{'} = diag (p \div p^{'}) M u \Leftrightarrow M u^{'} = M u * p \div p^{'}$

We can find $u^{'}$ satisfying this if and only if $p^{'}$ assigns probability 0 to any outcome that $p$ does and $M$ is right-invertible. That is, $p^{'}$ must not have a smaller support than $p$ , and we must be able to span $R^{m}$ with the by-action differences in outcome distributions for each possible world. I expect that $M$ will be right-invertible in most practical cases (there will be more possible worlds than outcomes).

For question 2: this will not be true in general. If for some possible world $i$ , $(M u)_{i}$ has a different sign from $(M u^{'})_{i}$ (that is, $u$ and $u^{'}$ recommend different actions in possible world $i$ ), then the policies must be different, unless $p_{i} = 0$ . So we are not able to replace changes in utility function with changes in beliefs, in general. There is a way to do this by making some possible worlds observationally equivalent, but I haven’t worked through all the details.