Time hierarchy theorems for distributional estimation problems

paulfchristiano20 Apr 2016 17:13 UTC

LW: 3 AF: 2

Warning: mostly for fun / basic science

##Preliminaries

Hierarchy theorems

The time hierarchy theorem is one of the simplest results in complexity theory. It says that if $f (n) ≫ g (n)$ , then there are functions that we can compute in time $f (n)$ that we can’t compute in time $g (n)$ . For example, there are functions that you can compute in $n^{2}$ time that you can’t compute in $n$ time.

Hierarchy theorems are proved by diagonalization—consider the problem, “does machine $M$ halt in time at most $n^{1.5}$ ?” This problem can be easily solved in $n^{2}$ time. But if any machine $M$ solves this problem in $n$ time then we can get a contradiction by asking $M$ about itself.

This proof strategy is very blunt. One way to formalize its bluntness is to introduce the notion of relative complexity. Rather than considering normal computers, we consider a computer that has access to a black box computing a particular function $f$ . Hierarchy theorems hold relative to any function $f$ .

Relativization is a hallmark of “easy” complexity theoretic results (i.e. those that we can prove). We can prove very few separations that don’t relativize. (Scott Aaronson has introduced a slightly stronger notion of algebrization which more accurately captures what we can actually prove, and we can prove a few more lower bounds on low-depth circuits.)

Distributional estimation problems

A distributional estimation problem is a sequence of distributions $μ_{n}$ over pairs $(x, y) \in {0, 1}^{n} \times [0, 1]$ . The goal of an estimator is to approximate $y$ given $x$ . The score of a estimator $A$ on $μ_{n}$ is the expected squared error, i.e. the expectation of ${(A (x) - y)}^{2}$ , for pairs $(x, y)$ drawn from $μ_{n}$ . If $A$ is a probabilistic estimator, then we also take an expectation over $A$ ’s internal randomness.

(This definition is due to Vadim Kosoy.)

Let’s say that $A$ is a better estimator than $B$ on a distributional estimation problem if there is a constant $ϵ$ and an $N$ such that for every $n > N$ , $A$ ‘s score on $μ_{n}$ is at least $ϵ$ higher than $B$ ‘s score (i.e., such that the lim inf of $A$ ‘s score minus $B$ ’s score is strictly positive).

Time hierarchy for distributional estimation problems

Now we can ask:

Is there a distributional estimation problem $μ_{n}$ and an estimator $A$ running in time $O (n^{3})$ such that $A$ is a better estimator on $μ_{n}$ than any estimator $B$ running in time $O (n^{2})$ ?

The answer is almost certainly “yes,” and there is a very natural hard problem—sample a machine $A$ which runs in time $n^{2.5}$ and estimate the expected value of $A ()$ .

Time hierarchy does not relativize for distributional estimation problems

We can construct a probabilistic oracle such that exactly the same set of distributional estimation problems can be solved in time $O (n log n)$ as can be solved by any algorithm running in any amount of time.

Namely, consider the construction of reflective oracles from this paper. With this oracle in hand, for any estimator $A$ running in any amount of time, there is an estimator $B$ running in time $O (n log n)$ which approximates the results of running $A$ up to error $1 / n$ , and in particular which is not a worse predictor than $A$ .

On input $x$ , $A$ queries the reflective oracle to estimate the expected value of $B (x)$ . It starts by comparing this expected value to $1 / 2$ , then performs a binary search to narrow down the value to an interval of length $1 / n$ . This gives us error of $o (1)$ , and it works regardless of how expensive $B$ is to compute.

This argument is relative to a certain probabilistic oracle. It would be more convincing if the containment failed relative to some deterministic oracle. I’m not sure if it does.

A natural candidate deterministic oracle is one which takes as input a randomized (oracle) Turing machine $M$ , a probability $p$ , an accuracy $ϵ$ , and an auxiliary input $x \in {0, 1}^{1 / ϵ}$ . The oracle makes no guarantees at all about its behavior on any particular input $x$ . But if $M$ accepts with probability strictly more than $p + ϵ$ in time $T$ , then the oracle guarantees that it returns $1$ on at least $2 / 3$ of the possible tuples $(M, p, ϵ, x)$ . And conversely, if $M$ accepts with probability strictly less than $p - ϵ$ , then the oracle guarantees that it returns $1$ on at most $1 / 3$ of the possible tuples $(M, p, ϵ, x)$ .

If such an oracle exists, then time hierarchy for distributional estimation problems certainly doesn’t hold with respect to this oracle. I can’t really tell whether such an oracle exists, my guess is that it does.

I think that the existence of such a deterministic oracle is itself an interesting question.

Conclusion

Hierarchy theorems are practically the only “easy” separation results in complexity theory. But they are easy to prove for reasons that seem morally unrelated to the real reasons that they are true.

In some sense distributional estimation problems are more natural than conventional decision problems. If we can’t prove time hierarchy for these problems, it is arguably one of the most fundamental gaps in our understanding of complexity theory, even more glaring than (for example) P vs PSPACE.

Because time hierarchy doesn’t seem to relativize for distributional estimation problems, I think there is a good chance that existing techniques can’t prove it. That said, there may also be a simple argument for hierarchy that I overlooked.

paulfchristiano20 Apr 2016 17:13 UTC

LW: 3 AF: 2

6 comments3 min readLW link

Stuart_Armstrong 22 Apr 2016 12:12 UTC
0 points
AF

On input $x$ , $A$ queries the reflective oracle...

Did you flip the the definitions of $A$ and $B$ in that paragraph compared with the previous one?
Vanessa Kosoy 21 Apr 2016 17:30 UTC
0 points
AF
Unless I’m missing something, if you consider arbitrary distributions then a time hierarchy definitely exists. To see this consider a Solomonoff distribution. For this distribution any problem that cannot be solved in given worst case complexity will have error bounded from below for any estimator of this complexity.

It is interesting to consider the question for computable or samplable distributions. For this I don’t have an immediate answer but we should probably look for time-hierarchy results in average-case complexity (I’m working through a crappy connection now so I can’t google it properly).
- paulfchristiano 21 Apr 2016 22:41 UTC
  0 points
  AF Parent
  I think that I posed the question badly. It’s not clear how to prove time hierarchy even for the worst-case estimation problems, the tricky part is the estimation not the distribution.
  - Vanessa Kosoy 22 Apr 2016 16:42 UTC
    0 points
    AF Parent
    Decision problems are a special case of estimation problems so it is sufficient to prove a time hierarchy for probabilistic algorithms. To see this, consider a decision problem $D \subseteq {0, 1}^{*}$ and an estimator $Q$ . We can now construct a probabilistic algorithm $A$ s.t. $Pr [A (x) = 1] = E [Q (x)]$ . We have $Pr [A (x) \neq χ_{D} (x)] \leq \sqrt{E [(Q (x) - χ_{D} (x))^{2}]}$ . Conversely, a probabilistic algorithm can be trivially interpreted as an estimator. So, a probabilistic algorithm with time complexity $t (k)$ and success probability $\frac{1}{6}$ gives an estimator with time complexity $t (k)$ and error $\frac{1}{6}$ . On the other hand there is some polynomial $p$ s.t. if there is no probabilistic algorithm with time complexity $p (t (k))$ and success probability $\frac{1}{6}$ , then there is no estimator with time complexity $t (k)$ and error $\frac{1}{5}$ (we should assume that $p$ is sufficient to amplify error probability $\frac{1}{\sqrt{5}}$ to $\frac{1}{6}$ ; for reasonable computational models a linear polynomial is enough).
    
    Now, you’re right that there is no complete proof of a probabilistic time hierarchy, however:
    
    a. The hierarchy is known given O(1) advice.
    
    b. The hierarchy follows given sufficient derandomization assumptions.
    - paulfchristiano 23 Apr 2016 17:42 UTC
      LW: 1 AF: 1
      AF Parent
      I agree that probabilistic time hierarchy would imply me hierarchy for estimation problems. Time hierarchy is actually known with literally 1 bit of advice.
      
      I believe that time hierarchy with O(1) advice also implies time hierarchy for estimation problems. Consider the advice machine for the hard problem running with advice 0 and the advice machine running with 1. Each of those estimates some function. If both of these functions are quickly estimable, then there is a fast advice machine which solves the hard problem (it uses its advice to decide which of the fast estimators to run), contradicting time hierarchy with advice.
      
      In general, estimation problems seem much nicer than decision problems when we are talking about probabilistic algorithms. For example, every algorithm effectively solves some estimation problem, whereas most probabilistic algorithms don’t solve any decision problem (this is why we can’t eliminate the advice). Relatedly, there are complete estimation problems but no natural complete decision problems for probabilistic time (and hence no natural candidates for a concrete problem to demonstrate probabilistic time hierarchy, which would make our inability to prove it significantly less interesting).
      
      I feel like there should be some intuitive claims that don’t work for BPP because it is a kind of terrible class, but which do work for estimation problems.
      - Vanessa Kosoy 24 Apr 2016 10:33 UTC
        0 points
        AF Parent
        Good point! More precisely, an arbitrary estimator $Q$ may have significant error w.r.t. $E [Q (x)]$ because $Var [Q (x)]$ doesn’t have to be small but it doesn’t matter because we can always amplify $Q$ by running it multiple times and averaging the results. So we still get a hierarchy, just slightly less tight (but this tightness is not very interesting since it depends on the computational model anyway).