Could you define the “Cake or Death problem” and given an example of a decision-making system that falls prey to it?

First nitpick: Since the sum on i (i just being some number I’m using to number utility functions) of u_i(w)·p(C(u_i)|w) is a function only dependent on w, it’s really just a complicatedly-written utility function. I think you want u_i(w)·p(C(u_i)|w, e) - that would allow the agent to gain some sort of evidence about its utility function. Also, since C(u_i) is presumably supposed to represent a fixed logical thingamabob, to be super-precise we could talk about some logical uncertainty measure over whether the utility function is correct, M(u_i, w, e), rather than a probability—but I think we don’t have to care about that.

Second nitpick: To see what happens, let’s assume our agent has figured out its utility function—it now picks the action with the largest sum on w of p(w|e, a)·u(w), where “w” is a world describing present, past and future, and u(w) is its one true utility function. This happens to look a lot like an evidential decision theory (EDT) agent, which runs into known problems. For example, if there was a disease that had low utility but made you unable to punch yourself in the face, this fact makes an EDT agent want to punch itself in the face so it could increase the probability it didn’t have the disease.

Oh, okay, thanks. So, shallowly speaking, you just needed to multiply the utilities of the strategies “don’t ask and pick cake” and “don’t ask and pick death” by 0.5.

Could you define the “Cake or Death problem” and given an example of a decision-making system that falls prey to it?

First nitpick: Since the sum on i (i just being some number I’m using to number utility functions) of u_i(w)·p(C(u_i)|w) is a function only dependent on w, it’s really just a complicatedly-written utility function. I think you want u_i(w)·p(C(u_i)|w, e) - that would allow the agent to gain some sort of evidence about its utility function. Also, since C(u_i) is presumably supposed to represent a fixed logical thingamabob, to be super-precise we could talk about some logical uncertainty measure over whether the utility function is correct, M(u_i, w, e), rather than a probability—but I think we don’t have to care about that.

Second nitpick: To see what happens, let’s assume our agent has figured out its utility function—it now picks the action with the largest sum on w of p(w|e, a)·u(w), where “w” is a world describing present, past and future, and u(w) is its one true utility function. This happens to look a lot like an evidential decision theory (EDT) agent, which runs into known problems. For example, if there was a disease that had low utility but made you unable to punch yourself in the face, this fact makes an EDT agent want to punch itself in the face so it could increase the probability it didn’t have the disease.

I’ll post the “cake or death” problem in a post soon.

This one?

(Remember:

alwaysgive your esoteric philosophical conundra good names.)Oh, okay, thanks. So, shallowly speaking, you just needed to multiply the utilities of the strategies “don’t ask and pick cake” and “don’t ask and pick death” by 0.5.

Yep! :-)