Open Problems in FAI

Edit: Please note that this write-up is poor quality, having the style of a hastily written personal note.

It has been mentioned that there should be a better write-up of open problems in FAI, and as I understand it there is an ongoing effort to explain such open problems. My feeling has been that the recent effort has tended too hold off proposing solutions for too long. I prefer the approach in the Tiling Agents paper, which explained problems through example systems which fail in various respects. What follows is an outline of what I’d write if I spent significant time; I think it is enough to be of some use. This list very much reflects my personal interests & beliefs.

- Tarski’s Undefinability Theorem

- We would like a system to be able to reason about itself (in a few critical ways), and Tarski’s Theorem is one of the important obstacles. Kripke provided the first hope of progress in this area by showing that we can embed a partial truth predicate in a language if we accept a “gap” (statements which cannot be assessed as true or false within the system’s self-theory). Work over the decades has “reduced the gap” (capturing an increasing number of the self-judgements we want, while always leaving a gap). There are also “glut” theories (which must assess some things as both true and false), which typically mirror gap theories. Paul Christiano provided a theory of probabilistic self-reference which intuitively reduces the “gap” to infinitesimal size: the system’s knowledge about its own probabilities can be wrong, but only by an infinitesimal. (For example, if it believes X, then it may fail to believe P(X)=1, but it will still believe P(X)>c for all c<1.) (Note, this feels a bit like a “glut” theory since the system solves the problem by saying too much rather than remaining silent.)

- First Incompleteness Theorem

- Logical Uncertainty:

- First-Order (“easy”): assigning probabilities to eventual results of (halting) computations we don’t have time to make. I claim this is mostly solved by the FOL prior: we can prefer simpler hypotheses about the behavior of systems, treating computations as black boxes which we use universal induction to predict, while allowing us to incorporate logical reasoning about the function’s behavior via bayesian updates. It also solves somewhat more; I claim it will have better properties than Solomonoff if the environment contains objects like halting oracles. (Some deficiencies with respect to reasoning about halting will be mentioned in the next section, however.)

- Second-Order (“impossible”): If we want to assign probabilities to programs halting, facts of number theory, or facts of set theory, we’re in serious trouble. Using the FOL prior admits nonstandard models. It’s not yet clear what qualities such a probability distribution should have. It seems reasonable to want universal statements to approach probability 1 as the set of positive examples approaches all the examples; this turns out to be as difficult to compute as all the bits in the arithmetic hierarchy. I thought it would be reasonable to restrict this to just the universal statements about computable predicates; ie, halting facts & equivalent. Will Sawin proved that it is not possible for our beliefs about halting to approach arbitrarily close to the correct values without some false Sigma_2 statements approaching arbitrarily close to 1. It remains an open problem to construct such a prior. My proposal is to (focusing on sentences in the regular forms Pi_n or Sigma_n) require that we only introduce a quantified statement into a theory if a statement of the same form already exists; so sentences at a given level in the arithmetic hierarchy must wait for a sentence at the next lowest level to be introduced. This does not block any true sentences from being produced, but causes halting facts to converge as we eliminate possible halting times. It is an open problem whether this proposed distribution converges. If this distribution exists, call it the bad arithmetic prior (BAP).

- Second Incompleteness Theorem

- Lobian Obstacle:

For a machine to plan its actions in the future, it needs to trust itself. The second incompleteness theorem (and, more generally, Lob’s theorem) makes this difficult. (All this is insufficiently formal, but the tiling agents paper gives a good explanation.) Several partial solutions have been proposed in a deterministic setting. It is an open problem whether one of Dan Willard’s several self-verifying systems solves this problem. (He had multiple proposals...) In case there is no purely logical solution, it seems intuitively promising to look for probabilistic self-trust. Difficulties are already presented by the previous section. If the BAP converges, then we can show that it has self-knowledge of that convergence of the form Paul Christiano described! This makes the false Sigma_2 beliefs feel more acceptable, because it is a necessary feature of the system’s self-reference. However, I think it’s the case that BAP ends up converging to 1 for *all* sigma_2 statements, which is really terrible.

- Anti-Lobian Obstacle:

In case the Lobian obstacle is solved, the anti-lobian obstacle may be a concern.