Thomas Kwa comments on Why Subagents?

Thomas Kwa 21 Apr 2022 21:45 UTC
11 points
0
This post makes two different points:
- Path-dependent preferences are not necessarily incoherent in practice. Therefore, the expected-utility-related coherence theorems are too strong. The correct selection theorems for agents actually generated by base optimizers will be some weaker notion than expected utility maximization.
- Path-dependent preferences are well-described by subagents. One particularly strong reason for this is the subagents argument: that subagents are sufficient to describe any path-dependent consistent preferences.
To decide whether subagents are the right model, I think we need both
- additional arguments for the niceness of subagents: how many subagents are required to represent path-dependent agents in practice? Are there reasons why subagents should arise from the selection process instead of a single EU maximizer?
- to investigate other possible models for path-dependent consistent behavior. Maybe one such model is a utility function with some explicit path-dependence term, though there are probably better ones I haven’t thought of.
- Thomas Kwa 30 Apr 2022 17:40 UTC
  15 points
  0
  Parent
  The number of subagents required to represent a partial preference ordering is the order dimension of the poset. If it’s not $O (log n)$ in the number of states, this would be bad for the subagents hypothesis! There are exponentially many possible states of the world, and superlogarithmic order dimension would mean agents have a number of subagents superlinear in the number of atoms in the world. So what are the order dimensions of posets we care about? I found the following results with a brief search:
  - The order dimension of a poset is less than or equal to its width (the size of the largest set of pairwise incomparable elements). Source.
    This doesn’t seem like a useful upper bound. If you have two sacred values, lives and beauty, then there are likely to be arbitrarily many incomparable states on the lives-beauty Pareto frontier, but the order dimension is two.
  - This paper finds the following bounds for order dimension of a random poset $P_{n, p}$ (defined by taking all edges in a random graph with n vertices where each edge has probability p, orienting them, then taking the transitive closure). If $p log log n \to \infty$ , the following result holds almost surely:
    $(1 - ϵ) \sqrt{\frac{log n}{log (1 / q)}} \leq dim P_{n, p} \leq (1 + ϵ) \sqrt{\frac{4 log n}{3 log (1 / q)}}$ where $q = 1 - p$ .
    The order dimension of a random poset decreases as p increases. We should expect agents in the real world to have reasonably high $p$ , since refusing to make a large proportion of trades is probably bad for reward.
    If $p = 0.99$ , then $dim P_{n, p} \leq (1 + ϵ) 0.95 \sqrt{log n}$
    If $p = 0.5$ , then $dim P_{n, p} \leq (1 + ϵ) 2.43 \sqrt{log n}$
    If $p = 0.01$ , we have $15.1 (1 - ϵ) \sqrt{log n} \leq dim P_{n, p} \leq 20.2 (1 + ϵ) \sqrt{log n}$
    This is still way too many subagents (~sqrt of number of atoms in the world) to actually make sense as e.g. a model of humans, but at least it can physically fit in an agent.
    Of course, this is just a heuristic argument, and if partial preference orderings in real life have some special structure, the conclusion might differ.
  - Nora Belrose 5 Jun 2022 18:02 UTC
    4 points
    0
    Parent
    Of course, this is just a heuristic argument, and if partial preference orderings in real life have some special structure, the conclusion might differ.
    Hmm I may be missing something here, but I suspect that “partial preference orderings in real life have some special structure” in the relevant sense, is very likely true. Human preferences don’t appear to be a random sample from the set of all possible partial orders over “world states” (or more accurately, human models of worlds).
    First of all, if you model human preferences as a vector-valued utility function (i.e. one element of the vector per subagent) it seems that it has to be continuous, and probably Lipschitz, in the sense that we’re limited in how much we can care about small changes in the world state. There’s probably some translation of this property into graph theory that I’m not aware of.
    Also, it seems like there’s one or a handful of preferred factorizations of our world model into axes-of-value, and different subagents will care about different factors/axes. More specifically, it appears that human preferences have a strong tendency to track the same abstractions that we use for empirical prediction of the world; as John says, human values are a function of humans’ latent variables. If you stop believing that souls and afterlives exist as a matter of science, it’s hard to continue sincerely caring about what happens to your soul after you die. We also don’t tend to care about weird contrived properties with no explanatory/predictive power like “grue” (green before 1 January 2030 and blue afterward).
    To the extent this is the case, it should dramatically– exponentially, I think– reduce the number of posets that are really possible and therefore the number of subagents needed to describe them.