Wei Dai comments on Pitfalls of Building UDT Agents

Wei Dai 31 Jul 2025 12:52 UTC
10 points
5
A lot of things you state here with apparent certainty, e.g., “We only care about this universe.” are things that I think are potential problems, but am unsure about. E.g. in UDT shows that decision theory is more puzzling than ever I wrote:

Indexical values are not reflectively consistent. UDT “solves” this problem by implicitly assuming (via the type signature of its utility function) that the agent doesn’t have indexical values. But humans seemingly do have indexical values, so what to do about that?

which I think is talking about the same or related issue. I think a lot of these (e.g. whether or not we really care or should care only about this universe) seem like hard philosophical problems that can’t be solved easily, so directly trying to solve them, or confidently assuming some solution like “We only care about this universe”, as part of AI safety/alignment seems like a bad idea to me.
- Cole Wyeth 31 Jul 2025 13:09 UTC
  4 points
  1
  Parent
  This is exactly what I wanted to discuss with you—it seems we have different intuitions about the significance of ensembles. I realize that what I am saying here is not a priori obvious—it is a longer discussion. This is why I suggest it could be a dialogue, or maybe we can just chat about it informally.