In this post I define the concept of quasi-optimal predictors which is a weaker variant on the theme of optimal predictors. I explain the properties of quasi-optimal predictors that I currently understand (which are completely parallel to the properties of optimal predictors) and give an example where there is a quasi-optimal predictor but there is no optimal predictor.
All proofs are given in the appendix and are mostly analogous to proofs of corresponding theorems for optimal predictors.
Definition 1
Given (D,μ) a distributional decision problem, a quasi-optimal predictor for (D,μ) is a family of polynomial size Boolean circuits {Pk:suppμkcirc−−→[0,1]}k∈N s.t. for any family of polynomial size Boolean circuits {Qk:suppμkcirc−−→[0,1]}k∈N we have
Eμk[(Pk(x)−χD(x))2]≤Eμk[(Qk(x)−χD(x))2]+δ(k)
where limk→∞δ(k)=0.
Theorem 1
Consider (D,μ) a distributional decision problem and P a quasi-optimal predictor for (D,μ). Suppose {pk∈[0,1]}k∈N, {qk∈[0,1]}k∈N are s.t.
∃ϵ>0∀k:μk{x∈{0,1}∗∣pk≤Pk(x)≤qk}≥ϵ
Then:
limk→∞Eμk[Pk(x)−χD(x)∣pk≤Pk(x)≤qk]=0
Theorem 2
Consider μ a word ensemble and D1, D2 disjoint languages. Suppose P1 is a quasi-optimal predictor for (D1,μ) and P2 is a quasi-optimal predictor for (D2,μ). Then, P:=η(P1+P2) is a quasi-optimal predictor for (D1∪D2,μ).
Theorem 3
Consider μ a word ensemble and D1, D2 disjoint languages. Suppose P1 is a quasi-optimal predictor for (D1,μ) and P is a quasi-optimal predictor for (D1∪D2,μ). Then, P2:=η(P−P1) is a quasi-optimal predictor for (D2,μ).
Theorem 4
Consider (D1,μ1), (D2,μ2) distributional decision problems with respective quasi-optimal predictors P1 and P2. Define {Pk:suppμkcirc−−→[0,1]}k∈N as the family of circuits computing Pk((x1,x2)):=Pk1(x1)Pk2(x2). Then, P is a quasi-optimal predictor for (D1×D2,μ1×μ2).
Theorem 5
Consider C,D⊆{0,1}∗ and μ a word ensemble. Assume PD is a quasi-optimal predictor for (D,μ) and PC∣D is a quasi-optimal predictor for (C,μ∣D). Then PDPC∣D is a quasi-optimal predictor for (C∩D,μ).
Theorem 6
Consider C,D⊆{0,1}∗ and μ a word ensemble. Assume ∃ϵ>0∀k:μk(D)≥ϵ. Assume PD is a quasi-optimal predictor for (D,μ) and PC∩D is a quasi-optimal predictor for (C∩D,μ). Define PC∣D as the circuit family computing
PkC∣D(x):=⎧⎪⎨⎪⎩1if PkD(x)=0η(PkC∩D(x)PkD(x))rounded to k binary places if PkD(x)>0
Then, PC∣D is a quasi-optimal predictor for (C,μ∣D).
Definition 2
Consider μ a word ensemble and {Qk1,2:suppμkcirc−−→[0,1]}k∈N two circuit families. We say Q1 is quasisimilar to Q2 relative to μ (denoted Q1μ≈Q2) when limk→∞Eμk[(Qk1(x)−Qk2(x))2]=0.
Theorem 7
Consider (D,μ) a distributional decision problem, P a quasi-optimal predictor for (D,μ) and {Qk:suppμkcirc−−→[0,1]}k∈N a polynomial size family. Then, Q is a quasi-optimal predictor for (D,μ) if and only if Pμ≈Q.
Definition 3
Consider (C,μ), (D,ν) distributional decision problems, {fk:suppμkcirc−−→{0,1}∗}k∈N a polynomial size family of circuits.f is called a (non-uniform) strong pseudo-invertible reduction of C to D when there is a polynomial p:N→N s.t. the following conditions hold:
(i) ∀k∈N,x∈suppμk:χD(fk(x))=χC(x)
(ii) There is M∈R s.t.
∀k∈N,y∈{0,1}∗:μk((fk)−1(y))νp(k)(y)≤M
(iii) There is a polynomial q:N→N and a family of polynomial size circuits {gk:suppνp(k)×{0,1}q(k)circ−−→{0,1}∗}k∈N s.t.
(iv) There are polynomial size circuits {Rk:suppνp(k)circ−−→Q≥0}k∈N s.t.
∀k∈N,y∈suppνp(k):Rk(y)=μk((fk)−1(y))νp(k)(y)
Theorem 8
Consider (C,μ), (D,ν) distributional decision problems, f a strong pseudo-invertible reduction of (C,μ) to (D,ν) and PD a quasi-optimal predictor for (D,ν). Define {PkC:suppμkcirc−−→[0,1]}k∈N as the family of circuits computing PkC(x):=Pp(k)D(fk(x)). Then, PC is a quasi-optimal predictor for (C,μ).
Theorem 9
Consider f:{0,1}∗→{0,1}∗ a one-to-one non-uniformly hard one-way function. Define ~μkf:=1k∑i<kμif. Then, Pf is a quasi-optimal predictor for (Df,~μf).
Appendix
Lemma 1
Consider (D,μ) a distributional decision problem and {Pk:suppμkcirc−−→[0,1]}k∈N a family of polynomial size. Then, P is a quasi-optimal predictor if and only if there is a function δ:N×N→[0,1] s.t.
(i) δ is non-decreasing in the second argument.
(ii) For any polynomial p:N→N:
limk→∞δ(k,p(k))=0
In the following, we will call functions satisfying conditions (i) and (ii) quasinegligible.
Assume to the contrary that there is ϵ>0 and an infinite set I⊆N s.t.
∀k∈I:|ϕk|≥ϵ
Define {wk:suppμkcirc−−→{0,1}}k∈N as the circuits computing
wk(x):=θ(Pk(x)−pk)θ(qk−Pk(x))
|wk| is bounded by a polynomial since Pk produces binary fractions of polynomial size therefore it is possible to compare them to the fixed numbers pk,qk using a polynomial size circuit even if the latter have infinite binary expansions.
We have
ϕk=Eμk[wk(x)(χD(x)−Pk(x))]Eμk[wk(x)]
Define ψk to be ϕk truncated to the first significant binary digit. Define {Qk:suppμkcirc−−→[0,1]}k∈N as the circuits computing
Qk(x):=η(Pk(x)+ψk)
By the assumption, ψk has binary notation of bounded size, therefore |Qk| is bounded by a polynomial.
The expression on the left hand side is a quadratic polynomial in ψk which attains its maximum at ϕk and has roots at 0 and 2ϕk. ψk is between 0 and ϕk, but not closer to 0 than ϕk2. Therefore, the inequality is preserved if we replace ψk by ϕk2.
Thus ϕk vanishes at infinity on I, which is a contradiction.
Lemma 3
Consider (D,μ) a distributional decision problem. If {Pk:suppμkcirc−−→[0,1]}k∈N is a quasi-optimal predictor for (D,μ) then there are c1,c2∈R and a quasinegligible function δ∗ s.t. for any Q:suppμkcirc−−→Q we have
Conversely, suppose M∈Q and {Pk:suppμkcirc−−→Q∩[−M,+M]}k∈N is a polynomial size family for which there is a quasinegligible function δ∗ s.t. for any Q:suppμkcirc−−→Q∩[−M−1,+M]}k∈N we have
|Eμk[Q(x)(Pk(x)−χD(x))]|≤δ∗(k,|Q|)
Define {~Pk:suppμkcirc−−→[0,1]}k∈N to be s.t. computing ~Pk(x) is equivalent to computing η(Pk(x)) rounded to k digits after the binary point. Then, ~P is a quasi-optimal predictor.
Proof of Lemma 3
Assume P is an optimal predictor. Consider Q:suppμkcirc−−→Q and t=σ2−a where σ∈{±1} and a∈N. The function η(Pk(x)+tQ(x)) can be approximated by a circuit of size p(k,|Q|) for some fixed polynomial p, within rounding error ϵk(x) s.t. ∀x∈suppμk:|ϵk(x)|≤2−k. By Lemma 1,
where δ is quasinegligible.ϵ is bounded by a negligible function and therefore can be ignored by redefining δ. As in the proof of Theorem 1, η can be dropped.
Eμk[(Pk(x)−χD(x))2−(Pk(x)+tQ(x)−χD(x))2]≤δ(k,|Q|)
The expression on the left hand side is a quadratic polynomial in t. Explicitly:
−Eμk[Q(x)2]t2−2Eμk[Q(x)(Pk(x)−χD(x))]t≤δ(k,|Q|)
Moving Eμk[Q(x)2]t2 to the right hand side and dividing both sides by 2|t|=21−a we get
Since both terms inside the absolute value are non-negative we get the desired result.
Proof of Theorem 6
When PkD(x)>0 we have
PkC∣D(x)=min(PkC∩D(x),PkD(x))PkD(x)
Define ~PkC∩D to be the circuit computing min(PkC∩D(x),PkD(x)). Since C∩D⊆D, Lemma 4 implies that limk→∞Eμk[PkC∩D(x)−~PkC∩D(x)]=0. This implies limk→∞Eμk[(PkC∩D(x)−~PkC∩D(x))2]=0 and by Theorem 7 ~PC∩D is a quasi-optimal predictor for (C∩D,μ).
We have ~PkC∩D(x)=PkC∣D(x)PkD(x) (whether PkD(x)>0 or PkD(x)=0) and therefore
By Lemma 3 it is sufficient to prove appropriate bounds on |Eμk[Q(x)(~PkC∩D(x)−χC∩D(x))]| and |Eμk[Q(x)PkC∣D(x)(PkD(x)−χD(x))]|. Both bounds follow from Lemma 3 using the facts ~PC∩D and PD are quasi-optimal predictors and |PkC∣D| is bounded by a polynomial.
Proof of Theorem 8
Consider k∈N, QC:suppμkcirc−−→[0,1]. Define QD:suppνp(k)×{0,1}q(k)circ−−→[0,1] to be the circuit computing QD(y,r):=QC(gk(y,r)). Applying Lemma 2, treating r as a constant and using R as the weight circuit, we get
Condition (iii) tells us that ∑r∈{0,1}q(k)gk(y,r)=x2−q(k) is only non-vanishing when y=fk(x) and that in this case it equals μk(x)μk((fk)−1(y)). Therefore
Assume to the contrary that Pf is not quasi-optimal. Then there is an infinite set I⊆N, a polynomial size family of circuits {Qk:supp~μkfcirc−−→[0,1]}k∈I and ϵ>0 s.t.
Define the functions {qk:supp~μkf×[0,1]→{0,1}}k∈I by qk(x,t):=θ(Qk(x)−t). We have
∀k∈I,x∈supp~μkf:Qk(x)=∫10qk(x,t)dt
Substituting into the inequality above
∀k∈I:E~μkf[(∫10qk(x,t)dt−χDf(x))2]≤14−ϵ
∀k∈I:E~μkf[|∫10qk(x,t)dt−χDf(x)|]2≤14−ϵ
∀k∈I:E~μkf[|∫10(qk(x,t)−χDf(x))dt|]≤√14−ϵ
For every given x, qk(x,t)−χDf(x) is either non-negative for all t or non-positive for t. Hence we can move the absolute value inside the integral:
∀k∈I:E~μkf[∫10|qk(x,t)−χDf(x)|dt]≤√14−ϵ
∀k∈I:∫10E~μkf[|qk(x,t)−χDf(x)|]dt≤√14−ϵ
This implies that we can choose {tk∈Qk(supp~μkf)∪{0,1}}k∈I s.t.
∀k∈I:E~μkf[|qk(x,tk)−χDf(x)|]≤√14−ϵ
∀k∈I:Pr~μkf[qk(x,tk)≠χDf(x)]≤√14−ϵ
∀k∈I:Pr~μkf[qk(x,tk)=χDf(x)]≥1−√14−ϵ
Using the fact that the graph of the square root lies below its tangent at any point, this leads to
∀k∈I:Pr~μkf[qk(x,tk)=χDf(x)]≥12+ϵ
Define {gk:f({0,1}k)×{0,1}kcirc−−→{0,1}}k∈N as the circuits computing gk(y,r):=1−qk((y,r),tk). The definitions of qk and tk imply that |gk| is bounded by a polynomial. The inequality above and the definitions of Df and ~μf imply
∀k∈I:1k∑i<kPrUi×Ui[gi(f(x),r)=x⋅r]≥12+ϵ
But this contradicts the assumption on f.
Note that this argument doesn’t show Pf is optimal since while the averaging over i preserves the property of vanishing at infinity, it doesn’t preserve the property of negligibility. Moreover, it is possible to show that no optimal predictor for (Df,~μf) exists.
Quasi-optimal predictors
In this post I define the concept of quasi-optimal predictors which is a weaker variant on the theme of optimal predictors. I explain the properties of quasi-optimal predictors that I currently understand (which are completely parallel to the properties of optimal predictors) and give an example where there is a quasi-optimal predictor but there is no optimal predictor.
All proofs are given in the appendix and are mostly analogous to proofs of corresponding theorems for optimal predictors.
Definition 1
Given (D,μ) a distributional decision problem, a quasi-optimal predictor for (D,μ) is a family of polynomial size Boolean circuits {Pk:suppμkcirc−−→[0,1]}k∈N s.t. for any family of polynomial size Boolean circuits {Qk:suppμkcirc−−→[0,1]}k∈N we have
Eμk[(Pk(x)−χD(x))2]≤Eμk[(Qk(x)−χD(x))2]+δ(k)
where limk→∞δ(k)=0.
Theorem 1
Consider (D,μ) a distributional decision problem and P a quasi-optimal predictor for (D,μ). Suppose {pk∈[0,1]}k∈N, {qk∈[0,1]}k∈N are s.t.
∃ϵ>0∀k:μk{x∈{0,1}∗∣pk≤Pk(x)≤qk}≥ϵ
Then:
limk→∞Eμk[Pk(x)−χD(x)∣pk≤Pk(x)≤qk]=0
Theorem 2
Consider μ a word ensemble and D1, D2 disjoint languages. Suppose P1 is a quasi-optimal predictor for (D1,μ) and P2 is a quasi-optimal predictor for (D2,μ). Then, P:=η(P1+P2) is a quasi-optimal predictor for (D1∪D2,μ).
Theorem 3
Consider μ a word ensemble and D1, D2 disjoint languages. Suppose P1 is a quasi-optimal predictor for (D1,μ) and P is a quasi-optimal predictor for (D1∪D2,μ). Then, P2:=η(P−P1) is a quasi-optimal predictor for (D2,μ).
Theorem 4
Consider (D1,μ1), (D2,μ2) distributional decision problems with respective quasi-optimal predictors P1 and P2. Define {Pk:suppμkcirc−−→[0,1]}k∈N as the family of circuits computing Pk((x1,x2)):=Pk1(x1)Pk2(x2). Then, P is a quasi-optimal predictor for (D1×D2,μ1×μ2).
Theorem 5
Consider C,D⊆{0,1}∗ and μ a word ensemble. Assume PD is a quasi-optimal predictor for (D,μ) and PC∣D is a quasi-optimal predictor for (C,μ∣D). Then PDPC∣D is a quasi-optimal predictor for (C∩D,μ).
Theorem 6
Consider C,D⊆{0,1}∗ and μ a word ensemble. Assume ∃ϵ>0∀k:μk(D)≥ϵ. Assume PD is a quasi-optimal predictor for (D,μ) and PC∩D is a quasi-optimal predictor for (C∩D,μ). Define PC∣D as the circuit family computing
PkC∣D(x):=⎧⎪⎨⎪⎩1if PkD(x)=0η(PkC∩D(x)PkD(x))rounded to k binary places if PkD(x)>0
Then, PC∣D is a quasi-optimal predictor for (C,μ∣D).
Definition 2
Consider μ a word ensemble and {Qk1,2:suppμkcirc−−→[0,1]}k∈N two circuit families. We say Q1 is quasisimilar to Q2 relative to μ (denoted Q1μ≈Q2) when limk→∞Eμk[(Qk1(x)−Qk2(x))2]=0.
Theorem 7
Consider (D,μ) a distributional decision problem, P a quasi-optimal predictor for (D,μ) and {Qk:suppμkcirc−−→[0,1]}k∈N a polynomial size family. Then, Q is a quasi-optimal predictor for (D,μ) if and only if Pμ≈Q.
Definition 3
Consider (C,μ), (D,ν) distributional decision problems, {fk:suppμkcirc−−→{0,1}∗}k∈N a polynomial size family of circuits.f is called a (non-uniform) strong pseudo-invertible reduction of C to D when there is a polynomial p:N→N s.t. the following conditions hold:
(i) ∀k∈N,x∈suppμk:χD(fk(x))=χC(x)
(ii) There is M∈R s.t.
∀k∈N,y∈{0,1}∗:μk((fk)−1(y))νp(k)(y)≤M
(iii) There is a polynomial q:N→N and a family of polynomial size circuits {gk:suppνp(k)×{0,1}q(k)circ−−→{0,1}∗}k∈N s.t.
∀y∈fk(suppμk),x∗∈{0,1}∗:PrUq(k)[gk(y,r)=x∗]=Prμk[x=x∗|fk(x)=y]
(iv) There are polynomial size circuits {Rk:suppνp(k)circ−−→Q≥0}k∈N s.t.
∀k∈N,y∈suppνp(k):Rk(y)=μk((fk)−1(y))νp(k)(y)
Theorem 8
Consider (C,μ), (D,ν) distributional decision problems, f a strong pseudo-invertible reduction of (C,μ) to (D,ν) and PD a quasi-optimal predictor for (D,ν). Define {PkC:suppμkcirc−−→[0,1]}k∈N as the family of circuits computing PkC(x):=Pp(k)D(fk(x)). Then, PC is a quasi-optimal predictor for (C,μ).
Theorem 9
Consider f:{0,1}∗→{0,1}∗ a one-to-one non-uniformly hard one-way function. Define ~μkf:=1k∑i<kμif. Then, Pf is a quasi-optimal predictor for (Df,~μf).
Appendix
Lemma 1
Consider (D,μ) a distributional decision problem and {Pk:suppμkcirc−−→[0,1]}k∈N a family of polynomial size. Then, P is a quasi-optimal predictor if and only if there is a function δ:N×N→[0,1] s.t.
(i) δ is non-decreasing in the second argument.
(ii) For any polynomial p:N→N:
limk→∞δ(k,p(k))=0
In the following, we will call functions satisfying conditions (i) and (ii) quasinegligible.
(iii) for any Q:suppμkcirc−−→[0,1] we have
Eμk[(Pk(x)−χD(x))2]≤Eμk[(Q(x)−χD(x))2]+δ(k,|Q|)
Proof of Lemma 1
Define
δ(k,q):=max|Q|≤q{Eμk[(Pk(x)−χD(x))2]−Eμk[(Q(x)−χD(x))2]}
Lemma 2
Consider (D,μ) a distributional decision problem and P a corresponding quasi-optimal predictor. Then, there is a function δ:N×N×N→[0,1] s.t.
(i) δ is non-decreasing in the second and third arguments.
(ii) For all polynomials p,q:N→N:
limk→∞δ(k,p(k),q(k))=0
(iii) for all k∈N, Q:suppμkcirc−−→[0,1] and w:suppμkcirc−−→Q≥0 we have
Eμk[w(x)(Pk(x)−χD(x))2]≤Eμk[w(x)(Q(x)−χD(x))2]+(maxw)δ(k,|Q|,|w|)
Proof of Lemma 2
Given t∈[0,maxw], denote
α(t):=min{s≥t∣∃x∈suppμk:w(x)=s}
Consider circuit Qt:suppμkcirc−−→[0,1] computing the following function:
Qt(x):={Q(x)if w(x)≥α(t)Pk(x)if w(x)<α(t)
There is a polynomial q s.t. |Qt|≤q(k,|Q|,|w|). By Lemma 1,
Eμk[(Pk(x)−χD(x))2]≤Eμk[(Qt(x)−χD(x))2]+δ(k,q(k,|Q|,|w|))
for δ quasinegligible.
Eμk[(Pk(x)−χD(x))2−(Qt(x)−χD(x))2]≤δ(k,q(k,|Q|,|w|))
Eμk[θ(w(x)−t)(Pk(x)−χD(x))2−(Q(x)−χD(x))2]≤δ(k,q(k,|Q|,|w|))
Integrating the inequality with respect to t from 0 to maxw, we get
Eμk[∫maxw0θ(w(x)−t)dt((Pk(x)−χD(x))2−(Q(x)−χD(x))2]≤(maxw)δ(k,q(k,|Q|,|w|))
Eμk[w(x)(Pk(x)−χD(x))2−(Q(x)−χD(x))2]≤(maxw)δ(k,q(k,|Q|,|w|))
Proof of Theorem 1
Define
ϕk:=Eμk[χD(x)−Pk(x)∣pk≤Pk(x)≤qk]
Assume to the contrary that there is ϵ>0 and an infinite set I⊆N s.t.
∀k∈I:|ϕk|≥ϵ
Define {wk:suppμkcirc−−→{0,1}}k∈N as the circuits computing
wk(x):=θ(Pk(x)−pk)θ(qk−Pk(x))
|wk| is bounded by a polynomial since Pk produces binary fractions of polynomial size therefore it is possible to compare them to the fixed numbers pk,qk using a polynomial size circuit even if the latter have infinite binary expansions.
We have
ϕk=Eμk[wk(x)(χD(x)−Pk(x))]Eμk[wk(x)]
Define ψk to be ϕk truncated to the first significant binary digit. Define {Qk:suppμkcirc−−→[0,1]}k∈N as the circuits computing
Qk(x):=η(Pk(x)+ψk)
By the assumption, ψk has binary notation of bounded size, therefore |Qk| is bounded by a polynomial.
Applying Lemma 2 we get
∀k∈I:Eμk[wk(x)(Pk(x)−χD(x))2]≤Eμk[wk(x)(Qk(x)−χD(x))2]+δ(k)
for δ vanishing at infinity.
∀k∈I:Eμk[wk(x)((Pk(x)−χD(x))2−(Qk(x)−χD(x))2)]≤δ(k)
∀k∈I:Eμk[wk(x)((Pk(x)−χD(x))2−(η(Pk(x)+ψk)−χD(x))2)]≤δ(k)
Obviously (η(Pk(x)+ψk)−χD(x))2≤(Pk(x)+ψk−χD(x))2, therefore
∀k∈I:Eμk[wk(x)((Pk(x)−χD(x))2−(Pk(x)+ψk−χD(x))2)]≤δ(k)
∀k∈I:ψkEμk[wk(x)(2(χD(x)−Pk(x))−ψk)]≤δ(k)
The expression on the left hand side is a quadratic polynomial in ψk which attains its maximum at ϕk and has roots at 0 and 2ϕk. ψk is between 0 and ϕk, but not closer to 0 than ϕk2. Therefore, the inequality is preserved if we replace ψk by ϕk2.
∀k∈I:ϕk2Eμk[wk(x)(2(χD(x)−Pk(x))−ϕk2)]≤δ(k)
Substituting the equation for ϕk we get
∀k∈I:12Eμk[wk(x)(χD(x)−Pk(x))]Eμk[wk(x)]Eμk[wk(x)(2(χD(x)−Pk(x))−12Eμk[wk(x)(χD(x)−Pk(x))]Eμk[wk(x)])]≤δ(k)
∀k∈I:34Eμk[wk(x)(χD(x)−Pk(x))]2Eμk[wk(x)]≤δ(k)
∀k∈I:34Eμk[wk(x)]ϕ2k≤δ(k)
∀k∈I:ϕ2k≤43Eμk[wk(x)]−1δ(k)
∀k∈I:ϕ2k≤43μk{x∈{0,1}∗∣pk≤Pk(x)≤qk}−1δ(k)
Thus ϕk vanishes at infinity on I, which is a contradiction.
Lemma 3
Consider (D,μ) a distributional decision problem. If {Pk:suppμkcirc−−→[0,1]}k∈N is a quasi-optimal predictor for (D,μ) then there are c1,c2∈R and a quasinegligible function δ∗ s.t. for any Q:suppμkcirc−−→Q we have
|Eμk[Q(x)(Pk(x)−χD(x))]|≤(c1+c2Eμk[Q(x)2])δ∗(k,|Q|)
Conversely, suppose M∈Q and {Pk:suppμkcirc−−→Q∩[−M,+M]}k∈N is a polynomial size family for which there is a quasinegligible function δ∗ s.t. for any Q:suppμkcirc−−→Q∩[−M−1,+M]}k∈N we have
|Eμk[Q(x)(Pk(x)−χD(x))]|≤δ∗(k,|Q|)
Define {~Pk:suppμkcirc−−→[0,1]}k∈N to be s.t. computing ~Pk(x) is equivalent to computing η(Pk(x)) rounded to k digits after the binary point. Then, ~P is a quasi-optimal predictor.
Proof of Lemma 3
Assume P is an optimal predictor. Consider Q:suppμkcirc−−→Q and t=σ2−a where σ∈{±1} and a∈N. The function η(Pk(x)+tQ(x)) can be approximated by a circuit of size p(k,|Q|) for some fixed polynomial p, within rounding error ϵk(x) s.t. ∀x∈suppμk:|ϵk(x)|≤2−k. By Lemma 1,
Eμk[(Pk(x)−χD(x))2]≤Eμk[(η(Pk(x)+tQ(x))+ϵk(x)−χD(x))2]+δ(k,|Q|)
where δ is quasinegligible.ϵ is bounded by a negligible function and therefore can be ignored by redefining δ. As in the proof of Theorem 1, η can be dropped.
Eμk[(Pk(x)−χD(x))2−(Pk(x)+tQ(x)−χD(x))2]≤δ(k,|Q|)
The expression on the left hand side is a quadratic polynomial in t. Explicitly:
−Eμk[Q(x)2]t2−2Eμk[Q(x)(Pk(x)−χD(x))]t≤δ(k,|Q|)
Moving Eμk[Q(x)2]t2 to the right hand side and dividing both sides by 2|t|=21−a we get
−Eμk[Q(x)(Pk(x)−χD(x))]σ≤2a−1δ(k,|Q|)+Eμk[Q(x)2]2−a−1
|Eμk[Q(x)(Pk(x)−χD(x))]|≤2a−1δ(k,|Q|)+Eμk[Q(x)2]2−a−1
Take a:=−12logδ(k,|Q|)+ϕ(k) where ϕ(k)∈[−12,+12] is the rounding error. We get
|Eμk[Q(x)(Pk(x)−χD(x))]|≤2ϕ(k)−1δ(k,|Q|)12+Eμk[Q(x)2]2−ϕ(k)−1δ(k,|Q|)12
Conversely, assume that for any R:suppμkcirc−−→Q∩[−M−1,+M]
|Eμk[R(x)(Pk(x)−χD(x))]|≤δ∗(k,|R|)
Consider Q:suppμkcirc−−→[0,1]. We have
Eμk[(Q(x)−χD(x))2]=Eμk[(Q(x)−Pk(x)+Pk(x)−χD(x))2]
Eμk[(Q(x)−χD(x))2]=Eμk[(Q(x)−Pk(x))2]+Eμk[(Pk(x)−χD(x))2]+2Eμk[(Q(x)−Pk(x))(Pk(x)−χD(x)]
2Eμk[(Pk(x)−Q(x))(Pk(x)−χD(x)]=Eμk[(Pk(x)−χD(x))2]−Eμk[(Q(x)−χD(x))2]+Eμk[(Q(x)−Pk(x))2]
Pk(x)−Q(x) can be computed by a circuit R of size polynomial in |Q| and k. Applying the assumption we get
Eμk[(Pk(x)−χD(x))2]−Eμk[(Q(x)−χD(x))2]+Eμk[(Q(x)−Pk(x))2]≤~δ(k,|Q|)
where ~δ is quasinegligible. Noting that Eμk[(Q(x)−Pk(x))2]≥0 and (η(Pk(x))−χD(x))2≤(Pk(x)−χD(x))2 we get
Eμk[(η(Pk(x))−χD(x))2]−Eμk[(Q(x)−χD(x))2]≤~δ(k,|Q|)
Observing that ~P−η(P) is bounded by a negligible function, we get the desired result.
Proof of Theorem 2
Consider Q:suppμkcirc−−→Q. We have
Eμk[Q(x)(Pk1(x)+Pk2(x)−χD1∪D2(x))]=Eμk[Q(x)(Pk1(x)−χD1(x))]+Eμk[Q(x)(Pk2(x)−χD2(x))]
Using Lemma 3:
|Eμk[Q(x)(Pk1(x)−χD1(x))]|≤(c11+c12Eμk[Q(x)2])δ1(k,|Q|)
|Eμk[Q(x)(Pk2(x)−χD2(x))]|≤(c21+c22Eμk[Q(x)2])δ2(k,|Q|)
Therefore
|Eμk[Q(x)(Pk1(x)+Pk2(x)−χD1∪D2(x))]|≤(c11+c21+(c12+c22)Eμk[Q(x)2])(δ1(k,|Q|)+δ2(k,|Q|))
Using Lemma 3 again we get the desired result.
Proof of Theorem 4
We have
Pk((x1,x2))−χD1×D2((x1,x2))=(Pk1(x1)−χD1(x1))χD2(x2)+Pk1(x1)(Pk2(x2)−χD2(x2))
Therefore, for any Q:supp(μ1×μ2)kcirc−−→Q∩[−1,+1]
|E(μ1×μ2)k[Q(x)(Pk(x)−χD1×D2(x))]|≤|Eμk1×μk2[Q((x1,x2))(Pk1(x1)−χD1(x1))χD2(x2)]|+|Eμk1×μk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|
By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. For the first term, we have
|Eμk1×μk2[Q((x1,x2))(Pk1(x1)−χD1(x1))χD2(x2)]|≤Eμk2[|Eμk1[χD2(x2)Q((x1,x2))(Pk1(x1)−χD1(x1))]|]
For any given x2, χD2(x2)Q((x1,x2)) can be computed by a circuit with input x1 of size polynomial in |x2| and |Q|. Applying Lemma 3 to P1, we get
|Eμk1×μk2[Q((x1,x2))(Pk1(x1)−χD1(x1))χD2(x2)]|≤Eμk2[δ1(k,p1(|x2|,|Q|))]
where p1 is a polynomial and δ1 is quasinegligible. Since |x2| is bounded by a polynomial in k for x2∈suppμk2, we get the bound we need.
For the second term, we have
|Eμk1×μk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|≤Eμk1[|Eμk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|]
For any given x1, Q((x1,x2))Pk1(x1) can be computed by a circuit with input x1 of size polynomial in k, |x1| and |Q|. Applying Lemma 3 to P2, we get
|Eμk1×μk2[Q((x1,x2))Pk1(x1)(Pk2(x2)−χD2(x2))]|≤Eμk1[δ2(k,p2(k,|x1|,|Q|))]
Again, we got the required bound.
Proof of Theorem 7
Assume Q is a quasi-optimal predictor. Applying Lemma 3 to predictor P and circuits computing Pk−Qk, we get
|Eμk[(Pk(x)−Qk(x))(Pk(x)−χD(x))]|≤δ(k)
for some δ vanishing at infinity. Applying Lemma 3 to predictor Q and circuits computing Pk−Qk, we get
|Eμk[(Pk(x)−Qk(x))(Qk(x)−χD(x))]|≤ϵ(k)
for some ϵ vanishing at infinity. We have
Eμk[(Pk(x)−Qk(x))2]=Eμk[(Pk(x)−Qk(x))(Pk(x)−χD(x))]−Eμk[(Pk(x)−Qk(x))(Qk(x)−χD(x))]
Eμk[(Pk(x)−Qk(x))2]≤|Eμk[(Pk(x)−Qk(x))(Pk(x)−χD(x))]|+|Eμk[(Pk(x)−Qk(x))(Qk(x)−χD(x))]|
Eμk[(Pk(x)−Qk(x))2]≤δ(k)+ϵ(k)
Conversely, assume Pμ≈Q. Consider some R:suppμkcirc−−→[0,1]. We have
Eμk[R(x)(Qk(x)−χD(x))]=Eμk[R(x)(Qk(x)−Pk(x)+Pk(x)−χD(x))]
Eμk[R(x)(Qk(x)−χD(x))]=Eμk[R(x)(Qk(x)−Pk(x))]+Eμk[R(x)(Pk(x)−χD(x))]
|Eμk[R(x)(Qk(x)−Pk(x))]|≤Eμk[|Qk(x)−Pk(x)|]≤√Eμk[(Qk(x)−Pk(x))2]≤δ(k)
for some δ vanishing at infinity, since Pμ≈Q.
|Eμk[R(x)(Pk(x)−χD(x))]|≤δ∗(k,|R|)
for some quasinegligible δ∗, using Lemma 3. Combining both inequalities we get
|Eμk[R(x)(Qk(x)−χD(x))]|≤δ(k)+δ∗(k,|R|)
Using Lemma 3 again we conclude Q is a quasi-optimal predictor.
Lemma 4
Consider C⊆D⊆{0,1}∗ and μ a word ensemble. Assume PC is a quasi-optimal predictor for (C,μ) and PD is a quasi-optimal predictor for (D,μ). Define
ϵk(x):=θ(PkC(x)−PkD(x))(PkC(x)−PkD(x))
Then, limk→∞Eμk[ϵk(x)]=0.
Proof of Lemma 4
By Theorem 3 and Lemma 3 there is a quasinegligible function δ such that for any Q:suppμkcirc−−→Q∩[−2,+1] we have
|Eμk[Q(x)(PkD(x)−PkC(x)−χD∖C(x))]|≤δ(k,|Q|)
Take Q to be the circuit computing θ(PkC(x)−PkD(x)). Its size is polynomial in k therefore
|Eμk[θ(PkC(x)−PkD(x))(PkD(x)−PkC(x)−χD∖C(x))]|≤δ∗(k)
where δ∗ vanishes at infinity.
|Eμk[ϵk(x)]+Eμk[θ(PkC(x)−PkD(x))χD∖C(x)]|≤δ∗(k)
Since both terms inside the absolute value are non-negative we get the desired result.
Proof of Theorem 6
When PkD(x)>0 we have
PkC∣D(x)=min(PkC∩D(x),PkD(x))PkD(x)
Define ~PkC∩D to be the circuit computing min(PkC∩D(x),PkD(x)). Since C∩D⊆D, Lemma 4 implies that limk→∞Eμk[PkC∩D(x)−~PkC∩D(x)]=0. This implies limk→∞Eμk[(PkC∩D(x)−~PkC∩D(x))2]=0 and by Theorem 7 ~PC∩D is a quasi-optimal predictor for (C∩D,μ).
We have ~PkC∩D(x)=PkC∣D(x)PkD(x) (whether PkD(x)>0 or PkD(x)=0) and therefore
~PkC∩D(x)−χC∩D(x)=(PkC∣D(x)−χC(x))χD(x)+PkC∣D(x)(PkD(x)−χD(x))
(PkC∣D(x)−χC(x))χD(x)=~PkC∩D(x)−χC∩D(x)−PkC∣D(x)(PkD(x)−χD(x))
Consider Q:suppμkcirc−−→Q∩[−1,+1].
Eμk∣D[Q(x)(PkC∣D(x)−χC(x))]=μk(D)−1Eμk[Q(x)(PkC∣D(x)−χC(x))χD(x)]
By Lemma 3 it is sufficient to prove appropriate bounds on |Eμk[Q(x)(~PkC∩D(x)−χC∩D(x))]| and |Eμk[Q(x)PkC∣D(x)(PkD(x)−χD(x))]|. Both bounds follow from Lemma 3 using the facts ~PC∩D and PD are quasi-optimal predictors and |PkC∣D| is bounded by a polynomial.
Proof of Theorem 8
Consider k∈N, QC:suppμkcirc−−→[0,1]. Define QD:suppνp(k)×{0,1}q(k)circ−−→[0,1] to be the circuit computing QD(y,r):=QC(gk(y,r)). Applying Lemma 2, treating r as a constant and using R as the weight circuit, we get
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]≤Eνp(k)[Rk(y)(QD(y,r)−χD(y))2]+δ(k,|QC|)
where δ is quasinegligible. We used condition (ii) to get a constant bound on maxRk and condition (iv) to get a polynomial bound on |Rk|.
We take the expectation value of both sides with respect to the uniform measure over r:
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]≤Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]+δ(k,|QC|)
The left hand side can be rewritten as follows
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑y∈{0,1}∗νp(k)(y)μk((fk)−1(y))νp(k)(y)(Pp(k)D(y)−χD(y))2
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑y∈{0,1}∗μk((fk)−1(y))(Pp(k)D(y)−χD(y))2
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑y∈{0,1}∗∑x∈suppμkfk(x)=yμk(x)(Pp(k)D(y)−χD(y))2
Grouping the sum by x, we get
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=∑x∈suppμkμk(x)(PkC(x)−χC(x))2
Eνp(k)[Rk(y)(Pp(k)D(y)−χD(y))2]=Eμk[(PkC(x)−χC(x))2]
The first term on the right hand side can be rewritten as
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=∑y∈{0,1}∗∑r∈{0,1}q(k)2−q(k)μk((fk)−1(y))(QD(y,r)−χD(y))2
Grouping the sum by x:=g(y,r) we get:
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=∑x∈{0,1}∗∑y∈{0,1}∗∑r∈{0,1}q(k)gk(y,r)=x2−q(k)μk((fk)−1(y))(QC(x)−χC(x))2
Condition (iii) tells us that ∑r∈{0,1}q(k)gk(y,r)=x2−q(k) is only non-vanishing when y=fk(x) and that in this case it equals μk(x)μk((fk)−1(y)). Therefore
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=∑x∈{0,1}∗μk(x)(QC(x)−χC(x))2
Eνp(k)×Uq(k)[Rk(y)(QD(y,r)−χD(y))2]=Eμk[(QC(x)−χC(x))2]
Putting everything together, we get
Eμk[(PkC(x)−χC(x))2]≤Eμk[(QC(x)−χC(x))2]+δ(k,|QC|)
Proof of Theorem 9
Assume to the contrary that Pf is not quasi-optimal. Then there is an infinite set I⊆N, a polynomial size family of circuits {Qk:supp~μkfcirc−−→[0,1]}k∈I and ϵ>0 s.t.
∀k∈I:E~μkf[(Pkf(x)−χDf(x)2)]≥E~μkf[(Qk(x)−χDf(x))2]+ϵ
∀k∈I:E~μkf[(Qk(x)−χDf(x))2]≤14−ϵ
Define the functions {qk:supp~μkf×[0,1]→{0,1}}k∈I by qk(x,t):=θ(Qk(x)−t). We have
∀k∈I,x∈supp~μkf:Qk(x)=∫10qk(x,t)dt
Substituting into the inequality above
∀k∈I:E~μkf[(∫10qk(x,t)dt−χDf(x))2]≤14−ϵ
∀k∈I:E~μkf[|∫10qk(x,t)dt−χDf(x)|]2≤14−ϵ
∀k∈I:E~μkf[|∫10(qk(x,t)−χDf(x))dt|]≤√14−ϵ
For every given x, qk(x,t)−χDf(x) is either non-negative for all t or non-positive for t. Hence we can move the absolute value inside the integral:
∀k∈I:E~μkf[∫10|qk(x,t)−χDf(x)|dt]≤√14−ϵ
∀k∈I:∫10E~μkf[|qk(x,t)−χDf(x)|]dt≤√14−ϵ
This implies that we can choose {tk∈Qk(supp~μkf)∪{0,1}}k∈I s.t.
∀k∈I:E~μkf[|qk(x,tk)−χDf(x)|]≤√14−ϵ
∀k∈I:Pr~μkf[qk(x,tk)≠χDf(x)]≤√14−ϵ
∀k∈I:Pr~μkf[qk(x,tk)=χDf(x)]≥1−√14−ϵ
Using the fact that the graph of the square root lies below its tangent at any point, this leads to
∀k∈I:Pr~μkf[qk(x,tk)=χDf(x)]≥12+ϵ
Define {gk:f({0,1}k)×{0,1}kcirc−−→{0,1}}k∈N as the circuits computing gk(y,r):=1−qk((y,r),tk). The definitions of qk and tk imply that |gk| is bounded by a polynomial. The inequality above and the definitions of Df and ~μf imply
∀k∈I:1k∑i<kPrUi×Ui[gi(f(x),r)=x⋅r]≥12+ϵ
But this contradicts the assumption on f.
Note that this argument doesn’t show Pf is optimal since while the averaging over i preserves the property of vanishing at infinity, it doesn’t preserve the property of negligibility. Moreover, it is possible to show that no optimal predictor for (Df,~μf) exists.