The VNM theorem is best understood as an operator that applies to a function Comp(Wi,Wj)→{Wi<Wj,Wi>Wj,Wi∼Wj} that obeys the axioms and rewrites that function in the form Comp(Wi,Wj)=Compare(E[U(Wi)],E[U(Wj)]) where U is the resulting “utility function” producing a real number. So it rewrites your function into one that compares “expected utilities”.

To apply this to something in the real world, a human or an AI, one must decide exactly what Comp refers to and how (>,<,∼) are interpreted.

We can interpret Comp as the actual revealed choices of the agent. Ie. when put in a position to take action to cause either Wi or Wj to happen, what do they do? If the agent’s thinking doesn’t terminate (within the allotted time), or it chooses randomly, we can interpret that as ∼. The possibilities are fully enumerated, so completeness holds. However, you will find that any real agent fails to obey some of the other axioms.

We can interpret Comp as the expressed preferences of the agent. That is to say, present the hypothetical and ask what the agent prefers. Then we say that Wi<Wj if the agent says they prefer Wj; we say that Wi>Wj if the agent says they prefer Wi, and we say that Wi∼Wj if the agent says they are equal or can’t decide (within the allotted time). Again completeness holds, but you will again always find that some of the other axioms will fail.

In the case of humans, we can interpret Comp as some extrapolated volition of a particular human. In which case we say that Wi<Wj if the person would choose Wj if only they thought faster, knew more, were smarter, were more the person they wished they would be, etc. One might fancifully describe this as defining Comp as the person’s “true preferences”. This is not a practical interpretation, since we don’t know how to compute extrapolated volition in the general case. But it’s perfectly mathematically valid, and it’s not hard to see how it could be defined so that completeness holds. It’s plausible that the other axioms could hold too—most people consider the rationality axioms generally desirable to conform to, so “more the person they wished they would be” plausibly points in a direction that results in such rationality.

For some AIs whose source code we have access to, we might be able to just read the source code and define Comp using the actual code that computes preferences.

There are a lot of variables here. One could interpret the domain of Comp as being a restricted set of lotteries. This is the likely interpretation in something like a psychology experiment where we are constrained to only asking about different flavours of ice cream or something. In that case the resulting utility function will only be valid in this particular restricted domain.

I was going to say the same thing as the first bullet point here—you can interpret the preference ordering as “If you were to give the agent two buttons that could cause world state 1 and world state 2 respectively, which would it choose?” (Indifference could be modeled as a third button which chooses randomly.) This gives you a definition of the full preference ordering which is complete by construction.

In practice, you only need to have utilities over world states you actually have to decide between, but I think the VNM utility theorem will apply in the same way to the world states which you actually care about.

Thanks for this response. On notation: I want world-states, Wi, to be specific outcomes rather than random variables. As such, U(Wi) is a real number, and the expectation of a real number could only be defined as itself: E[U(Wi)]=U(Wi) in all cases. I left aside all the discussion of ‘lotteries’ in the VNM Wikipedia article, though maybe I ought not have done so.

I think your first two bullet points are wrong. We can’t reasonably interpret ~ as ‘the agent’s thinking doesn’t terminate’. ~ refers to indifference between two options, so if A>B>C and P ~ B, then A>P>C. Equating ‘unable to decide between two options’ and ‘two options are equally preferable’ will lead to a contradiction or a trivial case when combined with transitivity. I can cook up something more explicit if you’d like?

There’s a similar problem with ~ meaning ‘the agent chooses randomly’, provided the random choice isn’t prompted by equality of preferences.

This comment has sharpened my thinking, and it would be good for me to directly prove my claims above—will edit if I get there.

The VNM theorem is best understood as an operator that applies to a function Comp(Wi,Wj)→{Wi<Wj,Wi>Wj,Wi∼Wj} that obeys the axioms and rewrites that function in the form Comp(Wi,Wj)=Compare(E[U(Wi)],E[U(Wj)]) where U is the resulting “utility function” producing a real number. So it rewrites your function into one that compares “expected utilities”.

To apply this to something in the real world, a human or an AI, one must decide exactly what Comp refers to and how (>,<,∼) are interpreted.

We can interpret Comp as the actual revealed choices of the agent. Ie. when put in a position to take action to cause either Wi or Wj to happen, what do they do? If the agent’s thinking doesn’t terminate (within the allotted time), or it chooses randomly, we can interpret that as ∼. The possibilities are fully enumerated, so completeness holds. However, you will find that any real agent fails to obey some of the other axioms.

We can interpret Comp as the expressed preferences of the agent. That is to say, present the hypothetical and ask what the agent prefers. Then we say that Wi<Wj if the agent says they prefer Wj; we say that Wi>Wj if the agent says they prefer Wi, and we say that Wi∼Wj if the agent says they are equal or can’t decide (within the allotted time). Again completeness holds, but you will again always find that some of the other axioms will fail.

In the case of humans, we can interpret Comp as some extrapolated volition of a particular human. In which case we say that Wi<Wj if the person

wouldchoose Wj if only they thought faster, knew more, were smarter, were more the person they wished they would be, etc. One might fancifully describe this as defining Comp as the person’s “true preferences”. This is not a practical interpretation, since we don’t know how to compute extrapolated volition in the general case. But it’s perfectly mathematically valid, and it’s not hard to see how it could be defined so that completeness holds. It’s plausible that the other axioms could hold too—most people consider the rationality axioms generally desirable to conform to, so “more the person they wished they would be” plausibly points in a direction that results in such rationality.For some AIs whose source code we have access to, we might be able to just read the source code and define Comp using the actual code that computes preferences.

There are a lot of variables here. One could interpret the domain of Comp as being a restricted set of lotteries. This is the likely interpretation in something like a psychology experiment where we are constrained to only asking about different flavours of ice cream or something. In that case the resulting utility function will only be valid in this particular restricted domain.

I was going to say the same thing as the first bullet point here—you can interpret the preference ordering as “If you were to give the agent two buttons that could cause world state 1 and world state 2 respectively, which would it choose?” (Indifference could be modeled as a third button which chooses randomly.) This gives you a definition of the full preference ordering which is complete by construction.

In practice, you only need to have utilities over world states you actually have to decide between, but I think the VNM utility theorem will apply in the same way to the world states which you actually care about.

Thanks for this response. On notation: I want world-states, Wi, to be specific outcomes rather than random variables. As such, U(Wi) is a real number, and the expectation of a real number could only be defined as itself: E[U(Wi)]=U(Wi) in all cases. I left aside all the discussion of ‘lotteries’ in the VNM Wikipedia article, though maybe I ought not have done so.

I think your first two bullet points are wrong. We can’t reasonably interpret ~ as ‘the agent’s thinking doesn’t terminate’. ~ refers to indifference between two options, so if A>B>C and P ~ B, then A>P>C. Equating ‘unable to decide between two options’ and ‘two options are equally preferable’ will lead to a contradiction or a trivial case when combined with transitivity. I can cook up something more explicit if you’d like?

There’s a similar problem with ~ meaning ‘the agent chooses randomly’, provided the random choice isn’t prompted by equality of preferences.

This comment has sharpened my thinking, and it would be good for me to directly prove my claims above—will edit if I get there.