It has been many years since I thought about this post, so what I say below could be wrong, but here’s my current understanding:
I think what I was trying to say in the post is that FDT-policy returns a policy, so “FDT(P, x)” means “the policy I would use if P were my state of knowledge and x were the observation I saw”. But that’s a policy, i.e. a mapping from observations to actions, so we need to call that policy on the actual observation in order to get an action, hence (FDT(P,x))(x).
Now, it’s not clear to me that FDT-policy actually needs the observation x in order to condition on what policy it is using. In other words, in the post I wrote the conditioned event as true(FDT(P––,x––)=π), but perhaps this should have just been true(FDT(P––)=π). In that case, the call to FDT should look like FDT(P), which is a policy, and then to get an action we would write FDT(P)(x).
The left hand side of the equation has type action (Hintze page 4: “An agent’s decision procedure takes sense data and outputs an action.”), but the right hand side has type policy, right?
There I was just quoting from the Hintze paper so it’s not clear what he meant. One interpretation is that the right hand side is just the definition of what “UDT(s)” means, so in that sense there wouldn’t be a type error, UDT(s) would also be a policy. But also, you’re right, a decision theory should in the end output an action. The right notation all comes down to what I said in the last paragraph of my previous comment, namely, does UDT1.1/FDT-policy need to know the sense data s (or ‘observation x’, in the other notation) in order to condition on the agent using a particular policy? If the answer is yes, UDT(s) is a policy, and UDT(s)(s) is the action. If the answer is no, then UDT is the policy (confusing! because UDT is also the ‘decision algorithm’ that finds the policy in the particular decision problem you are facing) and UDT(s) is the action. My best guess is that the answer to this question is ‘no’, so UDT is the policy and UDT(s) is the action, so your point about there being a type error is correct. But the notation f:s↦a in the Hintze paper makes it seem like somehow s is being used on the right hand side, which is possibly what confused me when I wrote the post.
Should this be FDT(P,x)? As is this looks to me like the second (x) introduces x into scope, and the first x is an out-of-scope usage.
It has been many years since I thought about this post, so what I say below could be wrong, but here’s my current understanding:
I think what I was trying to say in the post is that FDT-policy returns a policy, so “FDT(P, x)” means “the policy I would use if P were my state of knowledge and x were the observation I saw”. But that’s a policy, i.e. a mapping from observations to actions, so we need to call that policy on the actual observation in order to get an action, hence (FDT(P,x))(x).
Now, it’s not clear to me that FDT-policy actually needs the observation x in order to condition on what policy it is using. In other words, in the post I wrote the conditioned event as true(FDT(P––,x––)=π), but perhaps this should have just been true(FDT(P––)=π). In that case, the call to FDT should look like FDT(P), which is a policy, and then to get an action we would write FDT(P)(x).
The left hand side of the equation has type action (Hintze page 4: “An agent’s decision procedure takes sense data and outputs an action.”), but the right hand side has type policy, right?
There I was just quoting from the Hintze paper so it’s not clear what he meant. One interpretation is that the right hand side is just the definition of what “UDT(s)” means, so in that sense there wouldn’t be a type error, UDT(s) would also be a policy. But also, you’re right, a decision theory should in the end output an action. The right notation all comes down to what I said in the last paragraph of my previous comment, namely, does UDT1.1/FDT-policy need to know the sense data s (or ‘observation x’, in the other notation) in order to condition on the agent using a particular policy? If the answer is yes, UDT(s) is a policy, and UDT(s)(s) is the action. If the answer is no, then UDT is the policy (confusing! because UDT is also the ‘decision algorithm’ that finds the policy in the particular decision problem you are facing) and UDT(s) is the action. My best guess is that the answer to this question is ‘no’, so UDT is the policy and UDT(s) is the action, so your point about there being a type error is correct. But the notation f:s↦a in the Hintze paper makes it seem like somehow s is being used on the right hand side, which is possibly what confused me when I wrote the post.