In this post I define the concept of quasi-optimal predictors which is a weaker variant on the theme of optimal predictors. I explain the properties of quasi-optimal predictors that I currently understand (which are completely parallel to the properties of optimal predictors) and give an example where there is a quasi-optimal predictor but there is no optimal predictor.

All proofs are given in the appendix and are mostly analogous to proofs of corresponding theorems for optimal predictors.

Definition 1

Given $(D, μ)$ a distributional decision problem, a quasi-optimal predictor for $(D, μ)$ is a family of polynomial size Boolean circuits ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ s.t. for any family of polynomial size Boolean circuits ${Q^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ we have

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(Q^{k} (x) - χ_{D} (x))^{2}] + δ (k)$

where ${lim}_{k \to \infty} δ (k) = 0$ .

Theorem 1

Consider $(D, μ)$ a distributional decision problem and $P$ a quasi-optimal predictor for $(D, μ)$ . Suppose ${p_{k} \in [0, 1]}_{k \in N}$ , ${q_{k} \in [0, 1]}_{k \in N}$ are s.t.

$\exists ϵ > 0 \forall k : μ^{k} {x \in {0, 1}^{*} ∣ p_{k} \leq P^{k} (x) \leq q_{k}} \geq ϵ$

Then:

$lim k \to \infty E_{μ^{k}} [P^{k} (x) - χ_{D} (x) ∣ p_{k} \leq P^{k} (x) \leq q_{k}] = 0$

Theorem 2

Consider $μ$ a word ensemble and $D_{1}$ , $D_{2}$ disjoint languages. Suppose $P_{1}$ is a quasi-optimal predictor for $(D_{1}, μ)$ and $P_{2}$ is a quasi-optimal predictor for $(D_{2}, μ)$ . Then, $P := η (P_{1} + P_{2})$ is a quasi-optimal predictor for $(D_{1} \cup D_{2}, μ)$ .

Theorem 3

Consider $μ$ a word ensemble and $D_{1}$ , $D_{2}$ disjoint languages. Suppose $P_{1}$ is a quasi-optimal predictor for $(D_{1}, μ)$ and $P$ is a quasi-optimal predictor for $(D_{1} \cup D_{2}, μ)$ . Then, $P_{2} := η (P - P_{1})$ is a quasi-optimal predictor for $(D_{2}, μ)$ .

Theorem 4

Consider $(D_{1}, μ_{1})$ , $(D_{2}, μ_{2})$ distributional decision problems with respective quasi-optimal predictors $P_{1}$ and $P_{2}$ . Define ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ as the family of circuits computing $P^{k} ((x_{1}, x_{2})) := P_{1}^{k} (x_{1}) P_{2}^{k} (x_{2})$ . Then, $P$ is a quasi-optimal predictor for $(D_{1} \times D_{2}, μ_{1} \times μ_{2})$ .

Theorem 5

Consider $C, D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $P_{D}$ is a quasi-optimal predictor for $(D, μ)$ and $P_{C ∣ D}$ is a quasi-optimal predictor for $(C, μ ∣ D)$ . Then $P_{D} P_{C ∣ D}$ is a quasi-optimal predictor for $(C \cap D, μ) .$

Theorem 6

Consider $C, D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $\exists ϵ > 0 \forall k : μ^{k} (D) \geq ϵ$ . Assume $P_{D}$ is a quasi-optimal predictor for $(D, μ)$ and $P_{C \cap D}$ is a quasi-optimal predictor for $(C \cap D, μ)$ . Define $P_{C ∣ D}$ as the circuit family computing

$P_{C ∣ D}^{k} (x) := ⎧ ⎪ ⎨ ⎪ ⎩ \begin{matrix} 1 & if P_{D}^{k} (x) = 0 η (\frac{P_{C \cap D}^{k} (x)}{P_{D}^{k} (x)}) & rounded to k binary places if P_{D}^{k} (x) > 0 \end{matrix}$

Then, $P_{C ∣ D}$ is a quasi-optimal predictor for $(C, μ ∣ D)$ .

Definition 2

Consider $μ$ a word ensemble and ${Q_{1, 2}^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ two circuit families. We say $Q_{1}$ is quasisimilar to $Q_{2}$ relative to $μ$ (denoted $Q_{1} μ \approx Q_{2}$ ) when ${lim}_{k \to \infty} E_{μ^{k}} [(Q_{1}^{k} (x) - Q_{2}^{k} (x))^{2}] = 0$ .

Theorem 7

Consider $(D, μ)$ a distributional decision problem, $P$ a quasi-optimal predictor for $(D, μ)$ and ${Q^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ a polynomial size family. Then, $Q$ is a quasi-optimal predictor for $(D, μ)$ if and only if $P μ \approx Q$ .

Definition 3

Consider $(C, μ)$ , $(D, ν)$ distributional decision problems, ${f^{k} : supp μ^{k} c i r c - - \to {0, 1}^{*}}_{k \in N}$ a polynomial size family of circuits. $f$ is called a (non-uniform) strong pseudo-invertible reduction of $C$ to $D$ when there is a polynomial $p : N \to N$ s.t. the following conditions hold:

(i) $\forall k \in N, x \in supp μ^{k} : χ_{D} (f^{k} (x)) = χ_{C} (x)$

(ii) There is $M \in R$ s.t.

$\forall k \in N, y \in {0, 1}^{*} : \frac{μ^{k} ((f^{k})^{- 1} (y))}{ν^{p (k)} (y)} \leq M$

(iii) There is a polynomial $q : N \to N$ and a family of polynomial size circuits ${g^{k} : supp ν^{p (k)} \times {0, 1}^{q (k)} c i r c - - \to {0, 1}^{*}}_{k \in N}$ s.t.

$\forall y \in f^{k} (supp μ^{k}), x^{*} \in {0, 1}^{*} : P r_{U^{q (k)}} [g^{k} (y, r) = x^{*}] = P r_{μ^{k}} [x = x^{*} | f^{k} (x) = y]$

(iv) There are polynomial size circuits ${R^{k} : supp ν^{p (k)} c i r c - - \to Q^{\geq 0}}_{k \in N}$ s.t.

$\forall k \in N, y \in supp ν^{p (k)} : R^{k} (y) = \frac{μ^{k} ((f^{k})^{- 1} (y))}{ν^{p (k)} (y)}$

Theorem 8

Consider $(C, μ)$ , $(D, ν)$ distributional decision problems, $f$ a strong pseudo-invertible reduction of $(C, μ)$ to $(D, ν)$ and $P_{D}$ a quasi-optimal predictor for $(D, ν)$ . Define ${P_{C}^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ as the family of circuits computing $P_{C}^{k} (x) := P_{D}^{p (k)} (f^{k} (x))$ . Then, $P_{C}$ is a quasi-optimal predictor for $(C, μ)$ .

Theorem 9

Consider $f : {0, 1}^{*} \to {0, 1}^{*}$ a one-to-one non-uniformly hard one-way function. Define ${~ μ}_{f}^{k} := \frac{1}{k} \sum_{i < k} μ_{f}^{i}$ . Then, $P_{f}$ is a quasi-optimal predictor for $(D_{f}, {~ μ}_{f})$ .

Appendix

Lemma 1

Consider $(D, μ)$ a distributional decision problem and ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ a family of polynomial size. Then, $P$ is a quasi-optimal predictor if and only if there is a function $δ : N \times N \to [0, 1]$ s.t.

(i) $δ$ is non-decreasing in the second argument.

(ii) For any polynomial $p : N \to N$ :

$lim k \to \infty δ (k, p (k)) = 0$

In the following, we will call functions satisfying conditions (i) and (ii) quasinegligible.

(iii) for any $Q : supp μ^{k} c i r c - - \to [0, 1]$ we have

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] + δ (k, | Q |)$

Proof of Lemma 1

Define

$δ (k, q) := max | Q | \leq q {E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}]}$

Lemma 2

Consider $(D, μ)$ a distributional decision problem and $P$ a corresponding quasi-optimal predictor. Then, there is a function $δ : N \times N \times N \to [0, 1]$ s.t.

(i) $δ$ is non-decreasing in the second and third arguments.

(ii) For all polynomials $p, q : N \to N$ :

$lim k \to \infty δ (k, p (k), q (k)) = 0$

(iii) for all $k \in N$ , $Q : supp μ^{k} c i r c - - \to [0, 1]$ and $w : supp μ^{k} c i r c - - \to Q^{\geq 0}$ we have

$E_{μ^{k}} [w (x) (P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [w (x) (Q (x) - χ_{D} (x))^{2}] + (max w) δ (k, | Q |, | w |)$

Proof of Lemma 2

Given $t \in [0, max w]$ , denote

$α (t) := m i n {s \geq t ∣ \exists x \in supp μ^{k} : w (x) = s}$

Consider circuit $Q_{t} : supp μ^{k} c i r c - - \to [0, 1]$ computing the following function:

$Q_{t} (x) := {\begin{matrix} Q (x) & if w (x) \geq α (t) P^{k} (x) & if w (x) < α (t) \end{matrix}$

There is a polynomial $q$ s.t. $| Q_{t} | \leq q (k, | Q |, | w |)$ . By Lemma 1,

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(Q_{t} (x) - χ_{D} (x))^{2}] + δ (k, q (k, | Q |, | w |))$

for $δ$ quasinegligible.

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2} - (Q_{t} (x) - χ_{D} (x))^{2}] \leq δ (k, q (k, | Q |, | w |))$

$E_{μ^{k}} [θ (w (x) - t) (P^{k} (x) - χ_{D} (x))^{2} - (Q (x) - χ_{D} (x))^{2}] \leq δ (k, q (k, | Q |, | w |))$

Integrating the inequality with respect to $t$ from $0$ to $max w$ , we get

$E_{μ^{k}} [\int_{0}^{max w} θ (w (x) - t) d t ((P^{k} (x) - χ_{D} (x))^{2} - (Q (x) - χ_{D} (x))^{2}] \leq (max w) δ (k, q (k, | Q |, | w |))$

$E_{μ^{k}} [w (x) (P^{k} (x) - χ_{D} (x))^{2} - (Q (x) - χ_{D} (x))^{2}] \leq (max w) δ (k, q (k, | Q |, | w |))$

Proof of Theorem 1

Define

$ϕ_{k} := E_{μ^{k}} [χ_{D} (x) - P^{k} (x) ∣ p_{k} \leq P^{k} (x) \leq q_{k}]$

Assume to the contrary that there is $ϵ > 0$ and an infinite set $I \subseteq N$ s.t.

$\forall k \in I : | ϕ_{k} | \geq ϵ$

Define ${w^{k} : supp μ^{k} c i r c - - \to {0, 1}}_{k \in N}$ as the circuits computing

$w^{k} (x) := θ (P^{k} (x) - p_{k}) θ (q_{k} - P^{k} (x))$

$| w^{k} |$ is bounded by a polynomial since $P^{k}$ produces binary fractions of polynomial size therefore it is possible to compare them to the fixed numbers $p_{k}, q_{k}$ using a polynomial size circuit even if the latter have infinite binary expansions.

We have

$ϕ_{k} = \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]}{E_{μ^{k}} [w^{k} (x)]}$

Define $ψ_{k}$ to be $ϕ_{k}$ truncated to the first significant binary digit. Define ${Q^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ as the circuits computing

$Q^{k} (x) := η (P^{k} (x) + ψ_{k})$

By the assumption, $ψ_{k}$ has binary notation of bounded size, therefore $| Q^{k} |$ is bounded by a polynomial.

Applying Lemma 2 we get

$\forall k \in I : E_{μ^{k}} [w^{k} (x) (P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [w^{k} (x) (Q^{k} (x) - χ_{D} (x))^{2}] + δ (k)$

for $δ$ vanishing at infinity.

$\forall k \in I : E_{μ^{k}} [w^{k} (x) ((P^{k} (x) - χ_{D} (x))^{2} - (Q^{k} (x) - χ_{D} (x))^{2})] \leq δ (k)$

$\forall k \in I : E_{μ^{k}} [w^{k} (x) ((P^{k} (x) - χ_{D} (x))^{2} - (η (P^{k} (x) + ψ_{k}) - χ_{D} (x))^{2})] \leq δ (k)$

Obviously $(η (P^{k} (x) + ψ_{k}) - χ_{D} (x))^{2} \leq (P^{k} (x) + ψ_{k} - χ_{D} (x))^{2}$ , therefore

$\forall k \in I : E_{μ^{k}} [w^{k} (x) ((P^{k} (x) - χ_{D} (x))^{2} - (P^{k} (x) + ψ_{k} - χ_{D} (x))^{2})] \leq δ (k)$

$\forall k \in I : ψ_{k} E_{μ^{k}} [w^{k} (x) (2 (χ_{D} (x) - P^{k} (x)) - ψ_{k})] \leq δ (k)$

The expression on the left hand side is a quadratic polynomial in $ψ_{k}$ which attains its maximum at $ϕ_{k}$ and has roots at $0$ and $2 ϕ_{k}$ . $ψ_{k}$ is between $0$ and $ϕ_{k}$ , but not closer to $0$ than $\frac{ϕ_{k}}{2}$ . Therefore, the inequality is preserved if we replace $ψ_{k}$ by $\frac{ϕ_{k}}{2}$ .

$\forall k \in I : \frac{ϕ_{k}}{2} E_{μ^{k}} [w^{k} (x) (2 (χ_{D} (x) - P^{k} (x)) - \frac{ϕ_{k}}{2})] \leq δ (k)$

Substituting the equation for $ϕ_{k}$ we get

$\forall k \in I : \frac{1}{2} \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]}{E_{μ^{k}} [w^{k} (x)]} E_{μ^{k}} [w^{k} (x) (2 (χ_{D} (x) - P^{k} (x)) - \frac{1}{2} \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]}{E_{μ^{k}} [w^{k} (x)]})] \leq δ (k)$

$\forall k \in I : \frac{3}{4} \frac{E_{μ^{k}} [w^{k} (x) (χ_{D} (x) - P^{k} (x))]^{2}}{E_{μ^{k}} [w^{k} (x)]} \leq δ (k)$

$\forall k \in I : \frac{3}{4} E_{μ^{k}} [w^{k} (x)] ϕ_{k}^{2} \leq δ (k)$

$\forall k \in I : ϕ_{k}^{2} \leq \frac{4}{3} E_{μ^{k}} [w^{k} (x)]^{- 1} δ (k)$

$\forall k \in I : ϕ_{k}^{2} \leq \frac{4}{3} μ^{k} {x \in {0, 1}^{*} ∣ p_{k} \leq P^{k} (x) \leq q_{k}}^{- 1} δ (k)$

Thus $ϕ_{k}$ vanishes at infinity on $I$ , which is a contradiction.

Lemma 3

Consider $(D, μ)$ a distributional decision problem. If ${P^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ is a quasi-optimal predictor for $(D, μ)$ then there are $c_{1}, c_{2} \in R$ and a quasinegligible function $δ^{*}$ s.t. for any $Q : supp μ^{k} c i r c - - \to Q$ we have

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq (c_{1} + c_{2} E_{μ^{k}} [Q (x)^{2}]) δ^{*} (k, | Q |)$

Conversely, suppose $M \in Q$ and ${P^{k} : supp μ^{k} c i r c - - \to Q \cap [- M, + M]}_{k \in N}$ is a polynomial size family for which there is a quasinegligible function $δ^{*}$ s.t. for any $Q : supp μ^{k} c i r c - - \to Q \cap [- M - 1, + M]}_{k \in N}$ we have

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq δ^{*} (k, | Q |)$

Define ${{~ P}^{k} : supp μ^{k} c i r c - - \to [0, 1]}_{k \in N}$ to be s.t. computing ${~ P}^{k} (x)$ is equivalent to computing $η (P^{k} (x))$ rounded to $k$ digits after the binary point. Then, $~ P$ is a quasi-optimal predictor.

Proof of Lemma 3

Assume $P$ is an optimal predictor. Consider $Q : supp μ^{k} c i r c - - \to Q$ and $t = σ 2^{- a}$ where $σ \in {\pm 1}$ and $a \in N$ . The function $η (P^{k} (x) + t Q (x))$ can be approximated by a circuit of size $p (k, | Q |)$ for some fixed polynomial $p$ , within rounding error $ϵ_{k} (x)$ s.t. $\forall x \in supp μ^{k} : | ϵ_{k} (x) | \leq 2^{- k}$ . By Lemma 1,

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] \leq E_{μ^{k}} [(η (P^{k} (x) + t Q (x)) + ϵ_{k} (x) - χ_{D} (x))^{2}] + δ (k, | Q |)$

where $δ$ is quasinegligible. $ϵ$ is bounded by a negligible function and therefore can be ignored by redefining $δ$ . As in the proof of Theorem 1, $η$ can be dropped.

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2} - (P^{k} (x) + t Q (x) - χ_{D} (x))^{2}] \leq δ (k, | Q |)$

The expression on the left hand side is a quadratic polynomial in $t$ . Explicitly:

$- E_{μ^{k}} [Q (x)^{2}] t^{2} - 2 E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] t \leq δ (k, | Q |)$

Moving $E_{μ^{k}} [Q (x)^{2}] t^{2}$ to the right hand side and dividing both sides by $2 | t | = 2^{1 - a}$ we get

$- E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] σ \leq 2^{a - 1} δ (k, | Q |) + E_{μ^{k}} [Q (x)^{2}] 2^{- a - 1}$

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq 2^{a - 1} δ (k, | Q |) + E_{μ^{k}} [Q (x)^{2}] 2^{- a - 1}$

Take $a := - \frac{1}{2} log δ (k, | Q |) + ϕ (k)$ where $ϕ (k) \in [- \frac{1}{2}, + \frac{1}{2}]$ is the rounding error. We get

$| E_{μ^{k}} [Q (x) (P^{k} (x) - χ_{D} (x))] | \leq 2^{ϕ (k) - 1} δ (k, | Q |)^{\frac{1}{2}} + E_{μ^{k}} [Q (x)^{2}] 2^{- ϕ (k) - 1} δ (k, | Q |)^{\frac{1}{2}}$

Conversely, assume that for any $R : supp μ^{k} c i r c - - \to Q \cap [- M - 1, + M]$

$| E_{μ^{k}} [R (x) (P^{k} (x) - χ_{D} (x))] | \leq δ^{*} (k, | R |)$

Consider $Q : supp μ^{k} c i r c - - \to [0, 1]$ . We have

$E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] = E_{μ^{k}} [(Q (x) - P^{k} (x) + P^{k} (x) - χ_{D} (x))^{2}]$

$E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] = E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}] + E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] + 2 E_{μ^{k}} [(Q (x) - P^{k} (x)) (P^{k} (x) - χ_{D} (x)]$

$2 E_{μ^{k}} [(P^{k} (x) - Q (x)) (P^{k} (x) - χ_{D} (x)] = E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] + E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}]$

$P^{k} (x) - Q (x)$ can be computed by a circuit $R$ of size polynomial in $| Q |$ and $k$ . Applying the assumption we get

$E_{μ^{k}} [(P^{k} (x) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] + E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}] \leq ~ δ (k, | Q |)$

where $~ δ$ is quasinegligible. Noting that $E_{μ^{k}} [(Q (x) - P^{k} (x))^{2}] \geq 0$ and $(η (P^{k} (x)) - χ_{D} (x))^{2} \leq (P^{k} (x) - χ_{D} (x))^{2}$ we get

$E_{μ^{k}} [(η (P^{k} (x)) - χ_{D} (x))^{2}] - E_{μ^{k}} [(Q (x) - χ_{D} (x))^{2}] \leq ~ δ (k, | Q |)$

Observing that $~ P - η (P)$ is bounded by a negligible function, we get the desired result.

Proof of Theorem 2

Consider $Q : supp μ^{k} c i r c - - \to Q$ . We have

$E_{μ^{k}} [Q (x) (P_{1}^{k} (x) + P_{2}^{k} (x) - χ_{D_{1} \cup D_{2}} (x))] = E_{μ^{k}} [Q (x) (P_{1}^{k} (x) - χ_{D_{1}} (x))] + E_{μ^{k}} [Q (x) (P_{2}^{k} (x) - χ_{D_{2}} (x))]$

Using Lemma 3:

$| E_{μ^{k}} [Q (x) (P_{1}^{k} (x) - χ_{D_{1}} (x))] | \leq (c_{11} + c_{12} E_{μ^{k}} [Q (x)^{2}]) δ_{1} (k, | Q |)$

$| E_{μ^{k}} [Q (x) (P_{2}^{k} (x) - χ_{D_{2}} (x))] | \leq (c_{21} + c_{22} E_{μ^{k}} [Q (x)^{2}]) δ_{2} (k, | Q |)$

Therefore

$| E_{μ^{k}} [Q (x) (P_{1}^{k} (x) + P_{2}^{k} (x) - χ_{D_{1} \cup D_{2}} (x))] | \leq (c_{11} + c_{21} + (c_{12} + c_{22}) E_{μ^{k}} [Q (x)^{2}]) (δ_{1} (k, | Q |) + δ_{2} (k, | Q |))$

Using Lemma 3 again we get the desired result.

Proof of Theorem 4

We have

$P^{k} ((x_{1}, x_{2})) - χ_{D_{1} \times D_{2}} ((x_{1}, x_{2})) = (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2}) + P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))$

Therefore, for any $Q : supp (μ_{1} \times μ_{2})^{k} c i r c - - \to Q \cap [- 1, + 1]$

$| E_{(μ_{1} \times μ_{2})^{k}} [Q (x) (P^{k} (x) - χ_{D_{1} \times D_{2}} (x))] | \leq | E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2})] | + | E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] |$

By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. For the first term, we have

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2})] | \leq E_{μ_{2}^{k}} [| E_{μ_{1}^{k}} [χ_{D_{2}} (x_{2}) Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1}))] |]$

For any given $x_{2}$ , $χ_{D_{2}} (x_{2}) Q ((x_{1}, x_{2}))$ can be computed by a circuit with input $x_{1}$ of size polynomial in $| x_{2} |$ and $| Q |$ . Applying Lemma 3 to $P_{1}$ , we get

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) (P_{1}^{k} (x_{1}) - χ_{D_{1}} (x_{1})) χ_{D_{2}} (x_{2})] | \leq E_{μ_{2}^{k}} [δ_{1} (k, p_{1} (| x_{2} |, | Q |))]$

where $p_{1}$ is a polynomial and $δ_{1}$ is quasinegligible. Since $| x_{2} |$ is bounded by a polynomial in $k$ for $x_{2} \in supp μ_{2}^{k}$ , we get the bound we need.

For the second term, we have

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] | \leq E_{μ_{1}^{k}} [| E_{μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] |]$

For any given $x_{1}$ , $Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1})$ can be computed by a circuit with input $x_{1}$ of size polynomial in $k$ , $| x_{1} |$ and $| Q |$ . Applying Lemma 3 to $P_{2}$ , we get

$| E_{μ_{1}^{k} \times μ_{2}^{k}} [Q ((x_{1}, x_{2})) P_{1}^{k} (x_{1}) (P_{2}^{k} (x_{2}) - χ_{D_{2}} (x_{2}))] | \leq E_{μ_{1}^{k}} [δ_{2} (k, p_{2} (k, | x_{1} |, | Q |))]$

Again, we got the required bound.

Proof of Theorem 7

Assume $Q$ is a quasi-optimal predictor. Applying Lemma 3 to predictor $P$ and circuits computing $P^{k} - Q^{k}$ , we get

$| E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (P^{k} (x) - χ_{D} (x))] | \leq δ (k)$

for some $δ$ vanishing at infinity. Applying Lemma 3 to predictor $Q$ and circuits computing $P^{k} - Q^{k}$ , we get

$| E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (Q^{k} (x) - χ_{D} (x))] | \leq ϵ (k)$

for some $ϵ$ vanishing at infinity. We have

$E_{μ^{k}} [(P^{k} (x) - Q^{k} (x))^{2}] = E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (P^{k} (x) - χ_{D} (x))] - E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (Q^{k} (x) - χ_{D} (x))]$

$E_{μ^{k}} [(P^{k} (x) - Q^{k} (x))^{2}] \leq | E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (P^{k} (x) - χ_{D} (x))] | + | E_{μ^{k}} [(P^{k} (x) - Q^{k} (x)) (Q^{k} (x) - χ_{D} (x))] |$

$E_{μ^{k}} [(P^{k} (x) - Q^{k} (x))^{2}] \leq δ (k) + ϵ (k)$

Conversely, assume $P μ \approx Q$ . Consider some $R : supp μ^{k} c i r c - - \to [0, 1]$ . We have

$E_{μ^{k}} [R (x) (Q^{k} (x) - χ_{D} (x))] = E_{μ^{k}} [R (x) (Q^{k} (x) - P^{k} (x) + P^{k} (x) - χ_{D} (x))]$

$E_{μ^{k}} [R (x) (Q^{k} (x) - χ_{D} (x))] = E_{μ^{k}} [R (x) (Q^{k} (x) - P^{k} (x))] + E_{μ^{k}} [R (x) (P^{k} (x) - χ_{D} (x))]$

$| E_{μ^{k}} [R (x) (Q^{k} (x) - P^{k} (x))] | \leq E_{μ^{k}} [| Q^{k} (x) - P^{k} (x) |] \leq \sqrt{E_{μ^{k}} [(Q^{k} (x) - P^{k} (x))^{2}]} \leq δ (k)$

for some $δ$ vanishing at infinity, since $P μ \approx Q$ .

$| E_{μ^{k}} [R (x) (P^{k} (x) - χ_{D} (x))] | \leq δ^{*} (k, | R |)$

for some quasinegligible $δ^{*}$ , using Lemma 3. Combining both inequalities we get

$| E_{μ^{k}} [R (x) (Q^{k} (x) - χ_{D} (x))] | \leq δ (k) + δ^{*} (k, | R |)$

Using Lemma 3 again we conclude $Q$ is a quasi-optimal predictor.

Lemma 4

Consider $C \subseteq D \subseteq {0, 1}^{*}$ and $μ$ a word ensemble. Assume $P_{C}$ is a quasi-optimal predictor for $(C, μ)$ and $P_{D}$ is a quasi-optimal predictor for $(D, μ)$ . Define

$ϵ^{k} (x) := θ (P_{C}^{k} (x) - P_{D}^{k} (x)) (P_{C}^{k} (x) - P_{D}^{k} (x))$

Then, ${lim}_{k \to \infty} E_{μ^{k}} [ϵ^{k} (x)] = 0$ .

Proof of Lemma 4

By Theorem 3 and Lemma 3 there is a quasinegligible function $δ$ such that for any $Q : supp μ^{k} c i r c - - \to Q \cap [- 2, + 1]$ we have

$| E_{μ^{k}} [Q (x) (P_{D}^{k} (x) - P_{C}^{k} (x) - χ_{D ∖ C} (x))] | \leq δ (k, | Q |)$

Take $Q$ to be the circuit computing $θ (P_{C}^{k} (x) - P_{D}^{k} (x))$ . Its size is polynomial in $k$ therefore

$| E_{μ^{k}} [θ (P_{C}^{k} (x) - P_{D}^{k} (x)) (P_{D}^{k} (x) - P_{C}^{k} (x) - χ_{D ∖ C} (x))] | \leq δ^{*} (k)$

where $δ^{*}$ vanishes at infinity.

$| E_{μ^{k}} [ϵ^{k} (x)] + E_{μ^{k}} [θ (P_{C}^{k} (x) - P_{D}^{k} (x)) χ_{D ∖ C} (x)] | \leq δ^{*} (k)$

Since both terms inside the absolute value are non-negative we get the desired result.

Proof of Theorem 6

When $P_{D}^{k} (x) > 0$ we have

$P_{C ∣ D}^{k} (x) = \frac{min (P_{C \cap D}^{k} (x), P_{D}^{k} (x))}{P_{D}^{k} (x)}$

Define ${~ P}_{C \cap D}^{k}$ to be the circuit computing $min (P_{C \cap D}^{k} (x), P_{D}^{k} (x))$ . Since $C \cap D \subseteq D$ , Lemma 4 implies that ${lim}_{k \to \infty} E_{μ^{k}} [P_{C \cap D}^{k} (x) - {~ P}_{C \cap D}^{k} (x)] = 0$ . This implies ${lim}_{k \to \infty} E_{μ^{k}} [(P_{C \cap D}^{k} (x) - {~ P}_{C \cap D}^{k} (x))^{2}] = 0$ and by Theorem 7 ${~ P}_{C \cap D}$ is a quasi-optimal predictor for $(C \cap D, μ)$ .

We have ${~ P}_{C \cap D}^{k} (x) = P_{C ∣ D}^{k} (x) P_{D}^{k} (x)$ (whether $P_{D}^{k} (x) > 0$ or $P_{D}^{k} (x) = 0$ ) and therefore

${~ P}_{C \cap D}^{k} (x) - χ_{C \cap D} (x) = (P_{C ∣ D}^{k} (x) - χ_{C} (x)) χ_{D} (x) + P_{C ∣ D}^{k} (x) (P_{D}^{k} (x) - χ_{D} (x))$

$(P_{C ∣ D}^{k} (x) - χ_{C} (x)) χ_{D} (x) = {~ P}_{C \cap D}^{k} (x) - χ_{C \cap D} (x) - P_{C ∣ D}^{k} (x) (P_{D}^{k} (x) - χ_{D} (x))$

Consider $Q : supp μ^{k} c i r c - - \to Q \cap [- 1, + 1]$ .

$E_{μ^{k} ∣ D} [Q (x) (P_{C ∣ D}^{k} (x) - χ_{C} (x))] = μ^{k} (D)^{- 1} E_{μ^{k}} [Q (x) (P_{C ∣ D}^{k} (x) - χ_{C} (x)) χ_{D} (x)]$

By Lemma 3 it is sufficient to prove appropriate bounds on $| E_{μ^{k}} [Q (x) ({~ P}_{C \cap D}^{k} (x) - χ_{C \cap D} (x))] |$ and $| E_{μ^{k}} [Q (x) P_{C ∣ D}^{k} (x) (P_{D}^{k} (x) - χ_{D} (x))] |$ . Both bounds follow from Lemma 3 using the facts ${~ P}_{C \cap D}$ and $P_{D}$ are quasi-optimal predictors and $| P_{C ∣ D}^{k} |$ is bounded by a polynomial.

Proof of Theorem 8

Consider $k \in N$ , $Q_{C} : supp μ^{k} c i r c - - \to [0, 1]$ . Define $Q_{D} : supp ν^{p (k)} \times {0, 1}^{q (k)} c i r c - - \to [0, 1]$ to be the circuit computing $Q_{D} (y, r) := Q_{C} (g^{k} (y, r))$ . Applying Lemma 2, treating $r$ as a constant and using $R$ as the weight circuit, we get

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] \leq E_{ν^{p (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] + δ (k, | Q_{C} |)$

where $δ$ is quasinegligible. We used condition (ii) to get a constant bound on $max R^{k}$ and condition (iv) to get a polynomial bound on $| R^{k} |$ .

We take the expectation value of both sides with respect to the uniform measure over $r$ :

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] \leq E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] + δ (k, | Q_{C} |)$

The left hand side can be rewritten as follows

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} ν^{p (k)} (y) \frac{μ^{k} ((f^{k})^{- 1} (y))}{ν^{p (k)} (y)} (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}$

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} μ^{k} ((f^{k})^{- 1} (y)) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}$

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} \sum \begin{matrix} x \in supp μ^{k} f^{k} (x) = y \end{matrix} μ^{k} (x) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}$

Grouping the sum by $x$ , we get

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = \sum x \in supp μ^{k} μ^{k} (x) (P_{C}^{k} (x) - χ_{C} (x))^{2}$

$E_{ν^{p (k)}} [R^{k} (y) (P_{D}^{p (k)} (y) - χ_{D} (y))^{2}] = E_{μ^{k}} [(P_{C}^{k} (x) - χ_{C} (x))^{2}]$

The first term on the right hand side can be rewritten as

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = \sum y \in {0, 1}^{*} \sum r \in {0, 1}^{q (k)} 2^{- q (k)} μ^{k} ((f^{k})^{- 1} (y)) (Q_{D} (y, r) - χ_{D} (y))^{2}$

Grouping the sum by $x := g (y, r)$ we get:

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = \sum x \in {0, 1}^{*} \sum y \in {0, 1}^{*} \sum \begin{matrix} r \in {0, 1}^{q (k)} g^{k} (y, r) = x \end{matrix} 2^{- q (k)} μ^{k} ((f^{k})^{- 1} (y)) (Q_{C} (x) - χ_{C} (x))^{2}$

Condition (iii) tells us that $\sum_{\begin{matrix} r \in {0, 1}^{q (k)} g^{k} (y, r) = x \end{matrix}} 2^{- q (k)}$ is only non-vanishing when $y = f^{k} (x)$ and that in this case it equals $\frac{μ^{k} (x)}{μ^{k} ((f^{k})^{- 1} (y))}$ . Therefore

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = \sum x \in {0, 1}^{*} μ^{k} (x) (Q_{C} (x) - χ_{C} (x))^{2}$

$E_{ν^{p (k)} \times U^{q (k)}} [R^{k} (y) (Q_{D} (y, r) - χ_{D} (y))^{2}] = E_{μ^{k}} [(Q_{C} (x) - χ_{C} (x))^{2}]$

Putting everything together, we get

$E_{μ^{k}} [(P_{C}^{k} (x) - χ_{C} (x))^{2}] \leq E_{μ^{k}} [(Q_{C} (x) - χ_{C} (x))^{2}] + δ (k, | Q_{C} |)$

Proof of Theorem 9

Assume to the contrary that $P_{f}$ is not quasi-optimal. Then there is an infinite set $I \subseteq N$ , a polynomial size family of circuits ${Q^{k} : supp {~ μ}_{f}^{k} c i r c - - \to [0, 1]}_{k \in I}$ and $ϵ > 0$ s.t.

$\forall k \in I : E_{{~ μ}_{f}^{k}} [(P_{f}^{k} (x) - χ_{D_{f}} (x)^{2})] \geq E_{{~ μ}_{f}^{k}} [(Q^{k} (x) - χ_{D_{f}} (x))^{2}] + ϵ$

$\forall k \in I : E_{{~ μ}_{f}^{k}} [(Q^{k} (x) - χ_{D_{f}} (x))^{2}] \leq \frac{1}{4} - ϵ$

Define the functions ${q^{k} : supp {~ μ}_{f}^{k} \times [0, 1] \to {0, 1}}_{k \in I}$ by $q^{k} (x, t) := θ (Q^{k} (x) - t)$ . We have

$\forall k \in I, x \in supp {~ μ}_{f}^{k} : Q^{k} (x) = \int_{0}^{1} q^{k} (x, t) d t$

Substituting into the inequality above

$\forall k \in I : E_{{~ μ}_{f}^{k}} [(\int_{0}^{1} q^{k} (x, t) d t - χ_{D_{f}} (x))^{2}] \leq \frac{1}{4} - ϵ$

$\forall k \in I : E_{{~ μ}_{f}^{k}} [| \int_{0}^{1} q^{k} (x, t) d t - χ_{D_{f}} (x) |]^{2} \leq \frac{1}{4} - ϵ$

$\forall k \in I : E_{{~ μ}_{f}^{k}} [| \int_{0}^{1} (q^{k} (x, t) - χ_{D_{f}} (x)) d t |] \leq \sqrt{\frac{1}{4} - ϵ}$

For every given $x$ , $q^{k} (x, t) - χ_{D_{f}} (x)$ is either non-negative for all $t$ or non-positive for $t$ . Hence we can move the absolute value inside the integral:

$\forall k \in I : E_{{~ μ}_{f}^{k}} [\int_{0}^{1} | q^{k} (x, t) - χ_{D_{f}} (x) | d t] \leq \sqrt{\frac{1}{4} - ϵ}$

$\forall k \in I : \int_{0}^{1} E_{{~ μ}_{f}^{k}} [| q^{k} (x, t) - χ_{D_{f}} (x) |] d t \leq \sqrt{\frac{1}{4} - ϵ}$

This implies that we can choose ${t_{k} \in Q^{k} (supp {~ μ}_{f}^{k}) \cup {0, 1}}_{k \in I}$ s.t.

$\forall k \in I : E_{{~ μ}_{f}^{k}} [| q^{k} (x, t_{k}) - χ_{D_{f}} (x) |] \leq \sqrt{\frac{1}{4} - ϵ}$

$\forall k \in I : P r_{{~ μ}_{f}^{k}} [q^{k} (x, t_{k}) \neq χ_{D_{f}} (x)] \leq \sqrt{\frac{1}{4} - ϵ}$

$\forall k \in I : P r_{{~ μ}_{f}^{k}} [q^{k} (x, t_{k}) = χ_{D_{f}} (x)] \geq 1 - \sqrt{\frac{1}{4} - ϵ}$

Using the fact that the graph of the square root lies below its tangent at any point, this leads to

$\forall k \in I : P r_{{~ μ}_{f}^{k}} [q^{k} (x, t_{k}) = χ_{D_{f}} (x)] \geq \frac{1}{2} + ϵ$

Define ${g^{k} : f ({0, 1}^{k}) \times {0, 1}^{k} c i r c - - \to {0, 1}}_{k \in N}$ as the circuits computing $g^{k} (y, r) := 1 - q^{k} ((y, r), t_{k})$ . The definitions of $q^{k}$ and $t_{k}$ imply that $| g^{k} |$ is bounded by a polynomial. The inequality above and the definitions of $D_{f}$ and ${~ μ}_{f}$ imply

$\forall k \in I : \frac{1}{k} \sum i < k P r_{U^{i} \times U^{i}} [g^{i} (f (x), r) = x \cdot r] \geq \frac{1}{2} + ϵ$

But this contradicts the assumption on $f$ .

Note that this argument doesn’t show $P_{f}$ is optimal since while the averaging over $i$ preserves the property of vanishing at infinity, it doesn’t preserve the property of negligibility. Moreover, it is possible to show that no optimal predictor for $(D_{f}, {~ μ}_{f})$ exists.

Quasi-optimal predictors