Computer Scientists: For some problems, there are random algorithms with the property that they succeed with high probability for any possible input. No deterministic algorithm for these problems has this property. Therefore, random algorithms are superior.
Eliezer: But if we knew the probability distribution over possible inputs, we could create a deterministic algorithm with the property.
Computer Scientists: But we do not know the probability distribution over possible inputs.
Eliezer: Never say “I don’t know”! If you are in a state of ignorance, use an ignorance prior.
Now of course the key question is what sort of ignorance prior we should use. In Jaynes’s book, usually ignorance priors with some sort of nice invariance properties are used, which makes calculations simpler. For example if the input is a bit stream then we could assume that the bits are independent coinflips. However, in real life this does not correspond to a state of ignorance, but rather to a state of knowledge where we know that the bit stream does not contain predictable correlations. For example, the probability of 1000 zeros in a row according to this ignorance prior is 10^{-300}, which is not even remotely close to the intuitive probability.
The next step is to try to create an ignorance prior which somehow formalizes Occam’s razor. As anyone familiar with MIRI’s work on the problem of logical priors should know, this is more difficult than it sounds. Essentially, the best solution so far (see “Logical Induction”, Garrabrant et al. 2016) is to create a list of all of the ways the environment could contain predictable correlations (or in the language of this post, resemble an “adversarial telepath”), and then trade them off of each other to get a final probability. One of the main downsides of the algorithm is that it is not possible to list all of the possible ways the environment could be correlated (since there are infinitely many), so you have to limit yourself to taking a feasible sample.
Now, it is worth noting that the above paper is concerned with an “environment” which is really just the truth-values of mathematical statements! It is hard to see how this environment resembles any sort of “adversarial telepath”. But if we want to maintain the ethos of this post, it seems that we are forced to this conclusion. Otherwise, an environment with logical uncertainty could constitute a counterexample to the claim that randomness never helps.
To be precise, let f be a mathematical formula with one free variable representing an integer, and suppose we are given access to an oracle which can tell us the truth-values of the statements f(1),...,f(N). The problem is to compute (up to a fixed accuracy) the proportion of statements which are true, with the restriction that we can only make n queries, where n << N. Monte Carlo methods succeed with failure probability exponential in n, regardless of what f is.
Now suppose that f is determined by choosing m bits randomly, where m << n, and interpreting them as a mathematical formula (throwing out the results and trying again if it is impossible to do so). Then if the minimum failure probability is nonzero, it is exponential in m, not n, and therefore bigger than the failure probability for Monte Carlo methods. However, any algorithm which can be encoded in less than ~m bits fails with nonzero probability for diagonalization reasons.
In fact, the diagonalization failure is one of the least important ones, the main point is that you just don’t know enough about the environment to justify writing any particular algorithm. Any deterministically chosen sequence has a good chance of being correlated with the environment, just because the environment is math and math things are often correlated. Now, we can call this an “adversarial telepath” if we want, but it seems to occur often enough in real life that this designation hardly seems to “carve reality at its joints”.
TL;DR: If even math can be called an “adversarial telepath”, the term seems to have lost all its meaning.
So let me see if I’ve got this straight.
Computer Scientists: For some problems, there are random algorithms with the property that they succeed with high probability for any possible input. No deterministic algorithm for these problems has this property. Therefore, random algorithms are superior.
Eliezer: But if we knew the probability distribution over possible inputs, we could create a deterministic algorithm with the property.
Computer Scientists: But we do not know the probability distribution over possible inputs.
Eliezer: Never say “I don’t know”! If you are in a state of ignorance, use an ignorance prior.
Now of course the key question is what sort of ignorance prior we should use. In Jaynes’s book, usually ignorance priors with some sort of nice invariance properties are used, which makes calculations simpler. For example if the input is a bit stream then we could assume that the bits are independent coinflips. However, in real life this does not correspond to a state of ignorance, but rather to a state of knowledge where we know that the bit stream does not contain predictable correlations. For example, the probability of 1000 zeros in a row according to this ignorance prior is 10^{-300}, which is not even remotely close to the intuitive probability.
The next step is to try to create an ignorance prior which somehow formalizes Occam’s razor. As anyone familiar with MIRI’s work on the problem of logical priors should know, this is more difficult than it sounds. Essentially, the best solution so far (see “Logical Induction”, Garrabrant et al. 2016) is to create a list of all of the ways the environment could contain predictable correlations (or in the language of this post, resemble an “adversarial telepath”), and then trade them off of each other to get a final probability. One of the main downsides of the algorithm is that it is not possible to list all of the possible ways the environment could be correlated (since there are infinitely many), so you have to limit yourself to taking a feasible sample.
Now, it is worth noting that the above paper is concerned with an “environment” which is really just the truth-values of mathematical statements! It is hard to see how this environment resembles any sort of “adversarial telepath”. But if we want to maintain the ethos of this post, it seems that we are forced to this conclusion. Otherwise, an environment with logical uncertainty could constitute a counterexample to the claim that randomness never helps.
To be precise, let f be a mathematical formula with one free variable representing an integer, and suppose we are given access to an oracle which can tell us the truth-values of the statements f(1),...,f(N). The problem is to compute (up to a fixed accuracy) the proportion of statements which are true, with the restriction that we can only make n queries, where n << N. Monte Carlo methods succeed with failure probability exponential in n, regardless of what f is.
Now suppose that f is determined by choosing m bits randomly, where m << n, and interpreting them as a mathematical formula (throwing out the results and trying again if it is impossible to do so). Then if the minimum failure probability is nonzero, it is exponential in m, not n, and therefore bigger than the failure probability for Monte Carlo methods. However, any algorithm which can be encoded in less than ~m bits fails with nonzero probability for diagonalization reasons.
In fact, the diagonalization failure is one of the least important ones, the main point is that you just don’t know enough about the environment to justify writing any particular algorithm. Any deterministically chosen sequence has a good chance of being correlated with the environment, just because the environment is math and math things are often correlated. Now, we can call this an “adversarial telepath” if we want, but it seems to occur often enough in real life that this designation hardly seems to “carve reality at its joints”.
TL;DR: If even math can be called an “adversarial telepath”, the term seems to have lost all its meaning.