Wei Dai comments on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy

Wei Dai 17 Jan 2019 7:55 UTC
LW: 6 AF: 3
AF
I noticed that the sum inside $arg {max}_{a} \sum_{s_{1}, . . ., s_{n}} \sum_{i = 1}^{n} S S A (s_{i} in s_{1}, . . ., s_{n} ∣ o, π_{o \to a}) U (s_{n})$ is not actually an expected utility, because the SSA probabilities do not add up to 1 when there is more than one possible observation. The issue is that conditional on making an observation, the probabilities for the trajectories not containing that observation become 0, but the other probabilities are not renormalized. So this seems to be part way between “real” EDT and UDT (which does not set those probabilities to 0 and of course also does not renormalize).

This zeroing of probabilities of trajectories not containing the current observation (and renormalizing, if one was to do that) seems at best useless busywork, and at worst prevents coordination between agents making different observations. In this formulation of EDT, such coordination is ruled out in another way, namely by specifying that conditional on o→a, the agent is still sure the rest of π is unchanged (i.e., copies of itself receiving other observations keep following π). If we remove the zeroing/renormalizing and say that the agent ought to have more realistic beliefs conditional on o→a, I think we end up with something close to UDT1.0 (modulo differences in the environment model from the original UDT).

(Oh, I ignored the splitting up of probabilities of trajectories into SSA probabilities and then adding them back up again, which may have some intuitive appeal but ends up being just a null operation. Does anyone see a significance to that part?)
What links here?
- Wei Dai's comment on CDT=EDT=UDT by abramdemski (17 Jan 2019 8:07 UTC; 6 points)
- Wei Dai's comment on gwern’s Shortform by gwern (25 Apr 2021 0:43 UTC; 4 points)
- Caspar Oesterheld 11 Sep 2020 22:52 UTC
  LW: 4 AF: 3
  AF Parent
  Sorry for taking an eternity to reply (again).
  On the first point: Good point! I’ve now finally fixed the SSA probabilities so that they sum up to 1, which really they should, to really have a version of EDT.
  >prevents coordination between agents making different observations.
  Yeah, coordination between different observations is definitely not optimal in this case. But I don’t see an EDT way of doing it well. After all, there are cases where given one observation, you prefer one policy and given another observation you favor another policy. So I think you need the ex ante perspective to get consistent preferences over entire policies.
  >(Oh, I ignored the splitting up of probabilities of trajectories into SSA probabilities and then adding them back up again, which may have some intuitive appeal but ends up being just a null operation. Does anyone see a significance to that part?)
  The only significance is to get a version of EDT, which we would traditionally assume to have self-locating beliefs. From a purely mathematical point of view, I think it’s nonsense.