A different angle on UDT

Instead of directly asking “what decision theory should an agent run,” consider the question, “given a description of an environment and a description of an agent, identify the best action available to that agent, regardless of its algorithm.”

These are pretty much the same question, with slightly different framings. (If you answer the latter, then the answer to the former is “approximate the latter thing on the world model.”) The latter question is more general, though, and it highlights different parts of the problem.

For example, it highlights questions about what counts as an agent, how the agent is identified in the environment, how the set of things the agent “could have done” is determined, how the agent’s “observations” are identified, and so on. These parts of the question are where decision theory intersects with naturalized induction.

This angle puts the spurious proof problem in a new light. Instead of considering UDT as an algorithm which computes what its best available strategy is, consider a “global” version of UDT which takes any description of an environment and an agent within that environment, and computes the best strategy available to that agent.

Assume away the naturalized problems and say that there is a sorted list of outcomes (best first) and a set of available strategies (observation-to-action mappings). Now we can define GlobalUDT(E, A) as a function which takes a description E of the environment, a description A of the agent, and computes the best strategy available to that agent:

(I don’t know how to use latex in code blocks, so braces are used as dequotes in strings: if x="ell" then "h{x}o" is the string "hello".)

def GlobalUDT(E, A):
  for o in outcomes:
    for s in strategies:
      if PA proves "{A}()={s} implies {E}()={o}":
        return s
  return default_strategy

For a given universe U, Vladimir’s proof-based UDT can be recovered by defining the following via quining:

UDT := GlobalUDT(quote(U), quote(UDT))

And, as it happens, GlobalUDT will always identify the actual best strategy available to UDT. (GlobalUDT never finds bad spurious proofs, because it iterates through outcomes in order; if there are any spurious proofs then they are spurious proofs that the best strategy is the best strategy.) But GlobalUDT does not identify the best strategy for agents in general!

Consider the following universe:

outcomes = [3, 2, 1]
strategies = {Hi, Med, Low}
A = lambda: Low
E = lambda: {Hi: 3, Med: 2, Low: 1}[A()]

That is, there are three options Hi, Med, Low corresponding to payoffs 3, 2, 1. The “agent” always selects the action Low. GlobalUDT("E", "A") either returns Hi or Med, whichever it considers first, because both A()=Hi and A()=Med are contradictory (and so imply E()=Hi).

This isn’t all that surprising (what does it mean to ask what would have happened if (lambda: Low)() == Hi?), but it does distill the problem with using logical implications as counterfactuals a bit.GlobalUDT only identifies the right strategy on UDT because it happens that UDT is something in the environment where knowing what it does requires knowing that PA is consistent, and GlobalUDT doesn’t know that.

(But a version of GlobalUDT using ZFC would prescribe bad strategies for a PA-based version of UDT in the environment.)

Moving beyond proof-based UDT requires a better theory of logical counterfactuals, and GlobalUDT provides a slightly different angle of approach than the one I’m used to. Intuitively, a satisfactory theory of logical counterfactuals should give us an algorithm that identifies Hi as the best strategy available to const Low. I’m actually not too convinced that such a thing is possible/​meaningful (at least, not without context and better answers to questions like “what’s an agent” and “what’s an embedding”), but this makes it a bit clearer that when we talk about wanting a full theory of logical counterfactuals we’re talking about defining a working version of GlobalUDT.

Food for thought.