Can noise have power?
One of the most interesting debates on Less Wrong that seems like it should be definitively resolvable is the one between Eliezer Yudkowsky, Scott Aaronson, and others on The Weighted Majority Algorithm. I’ll reprint the debate here in case anyone wants to comment further on it.
In that post, Eliezer argues that “noise hath no power” (read the post for details). Scott disagreed. He replied:
...Randomness provably never helps in average-case complexity (i.e., where you fix the probability distribution over inputs) -- since given any ensemble of strategies, by convexity there must be at least one deterministic strategy in the ensemble that does at least as well as the average.
On the other hand, if you care about the worst-case running time, then there are settings (such as query complexity) where randomness provably does help. For example, suppose you’re given n bits, you’re promised that either n/3 or 2n/3 of the bits are 1′s, and your task is to decide which. Any deterministic strategy to solve this problem clearly requires looking at 2n/3 + 1 of the bits. On the other hand, a randomized sampling strategy only has to look at O(1) bits to succeed with high probability.
Whether randomness ever helps in worst-case polynomial-time computation is the P versus BPP question, which is in the same league as P versus NP. It’s conjectured that P=BPP (i.e., randomness never saves more than a polynomial). This is known to be true if really good pseudorandom generators exist, and such PRG’s can be constructed if certain problems that seem to require exponentially large circuits, really do require them (see this paper by Impagliazzo and Wigderson). But we don’t seem close to proving P=BPP unconditionally.
Scott, I don’t dispute what you say. I just suggest that the confusing term “in the worst case” be replaced by the more accurate phrase “supposing that the environment is an adversarial superintelligence who can perfectly read all of your mind except bits designated ‘random’”.
I often tell people that theoretical computer science is basically mathematicized paranoia, and that this is the reason why Israelis so dominate the field. You’re absolutely right: we do typically assume the environment is an adversarial superintelligence. But that’s not because we literally think it is one, it’s because we don’t presume to know which distribution over inputs the environment is going to throw at us. (That is, we lack the self-confidence to impose any particular prior on the inputs.) We do often assume that, if we generate random bits ourselves, then the environment isn’t going to magically take those bits into account when deciding which input to throw at us. (Indeed, if we like, we can easily generate the random bits after seeing the input—not that it should make a difference.)
Average-case analysis is also well-established and used a great deal. But in those cases where you can solve a problem without having to assume a particular distribution over inputs, why complicate things unnecessarily by making such an assumption? Who needs the risk?
And later added:
...Note that I also enthusiastically belong to a “derandomize things” crowd! The difference is, I think derandomizing is hard work (sometimes possible and sometimes not), since I’m unwilling to treat the randomness of the problems the world throws at me on the same footing as randomness I generate myself in the course of solving those problems. (For those watching at home tonight, I hope the differences are now reasonably clear...)
I certainly don’t say “it’s not hard work”, and the environmental probability distribution should not look like the probability distribution you have over your random numbers—it should contain correlations and structure. But once you know what your probability distribution is, then you should do your work relative to that, rather than assuming “worst case”. Optimizing for the worst case in environments that aren’t actually adversarial, makes even less sense than assuming the environment is as random and unstructured as thermal noise.
I would defend the following sort of statement: While often it’s not worth the computing power to take advantage of all the believed-in regularity of your probability distribution over the environment, any environment that you can’t get away with treating as effectively random, probably has enough structure to be worth exploiting instead of randomizing.
(This isn’t based on career experience, it’s how I would state my expectation given my prior theory.)
> “once you know what your probability distribution is...”
I’d merely stress that that’s an enormous “once.” When you’re writing a program (which, yes, I used to do), normally you have only the foggiest idea of what a typical input is going to be, yet you want the program to work anyway. This is not just a hypothetical worry, or something limited to cryptography: people have actually run into strange problems using pseudorandom generators for Monte Carlo simulations and hashing (see here for example, or Knuth vol 2).
Even so, intuition suggests it should be possible to design PRG’s that defeat anything the world is likely to throw at them. I share that intuition; it’s the basis for the (yet-unproved) P=BPP conjecture.
“Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin.”—von Neumann
And that’s where the debate drops off, at least between Eliezer and Scott, at least on that thread.