Vanessa Kosoy comments on The Credit Assignment Problem

Vanessa Kosoy 13 Nov 2019 0:21 UTC
LW: 11 AF: 3
0
AF
(I don’t speak for Abram but I wanted to explain my own opinion.) Decision theory asks, given certain beliefs an agent has, what is the rational action for em to take. But, what are these “beliefs”? Different frameworks have different answers for that. For example, in CDT a belief is a causal diagram. In EDT a belief is a joint distribution over actions and outcomes. In UDT a belief might be something like a Turing machine (inside the execution of which the agent is supposed to look for copies of emself). Learning theory allows us to gain insight through the observation that beliefs must be learnable, otherwise how would the agent come up with these beliefs in the first place? There might be parts of the beliefs that come from the prior and cannot be learned, but still, at least the type signature of beliefs should be compatible with learning.

Moreover, decision problems are often implicitly described from the point of view of a third party. For example, in Newcomb’s paradox we postulate that Omega can predict the agent, which makes perfect sense for an observer looking from the side, but might be difficult to formulate from the point of view of the agent itself. Therefore, understanding decision theory requires the translation of beliefs from the point of view of one observer to the point of view of another. Here also learning theory can help us: we can ask, what are the beliefs Alice should expect Bob to learn given particular beliefs of Alice about the world? From a slightly different angle, the central source of difficulty in decision theory is the notion of counterfactuals, and the attempt to prescribe particular meaning to them, which different decision theories do differently. Instead, we can just postulate that, from the subjective point of view of the agent, counterfactuals are ontologically basic. The agent believes emself to have free will, so to speak. Then, the interesting quesiton is, what kind of counterfactuals are produced by the translation of beliefs from the perspective of a third party to the perspective of the given agent.

Indeed, thinking about learning theory led me to the notion of quasi-Bayesian agents (agents that use incomplete/fuzzy models), and quasi-Bayesian agents automatically solve all Newcomb-like decision problems. In other words, quasi-Bayesian agents are effectively a rigorous version of UDT.

Incidentally, to align AI we literally need to translate beliefs from the user’s point of view to the AI’s point of view. This is also solved via the same quasi-Bayesian approach. In particular, this translation process preserves the “point of updatelessness”, which, in my opinion, is the desired result (the choice of this point is subjective).
What links here?
- abramdemski's comment on The Credit Assignment Problem by abramdemski (13 Nov 2019 22:52 UTC; 2 points)