Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.
In the following, all infradistributions are crisp.
Fix finite action set A and finite observation set O. For any k∈N and γ∈(0,1), let
Mkγ:(A×O)ω→Δ(A×O)k
be defined by
Mkγ(h|d):=(1−γ)∞∑n=0γn[[h=dn:n+k]]
In other words, this kernel samples a time step n out of the geometric distribution with parameter γ, and then produces the sequence of length k that appears in the destiny starting at n.
For any continuous[1] function D:□(A×O)k→R, we get a decision rule. Namely, this rule says that, given infra-Bayesian law Λ and discount parameter γ, the optimal policy is
π∗DΛ:=argmaxπ:O∗→AD(Mkγ∗Λ(π))
The usual maximin is recovered when we have some reward function r:(A×O)k→R and corresponding to it is
Dr(Θ):=minθ∈ΘEθ[r]
Given a set H of laws, it is said to be learnable w.r.t.D when there is a family of policies {πγ}γ∈(0,1) such that for any Λ∈H
limγ→1(maxπD(Mkγ∗Λ(π))−D(Mkγ∗Λ(πγ))=0
For Dr we know that e.g. the set of all communicating[2] finite infra-RDPs is learnable. More generally, for any t∈[0,1] we have the learnable decision rule
Also, any monotonically increasing D seems to be learnable, i.e. any D s.t. for Θ1⊆Θ2 we have D(Θ1)≤D(Θ2). For such decision rules, you can essentially assume that “nature” (i.e. whatever resolves the ambiguity of the infradistributions) is collaborative with the agent. These rules are not very interesting.
On the other hand, decision rules of the form Dr1+Dr2 are not learnable in general, and so are decision rules of the form Dr+D′ for D′ monotonically increasing.
Open Problem: Are there any learnable decision rules that are not mesomism or monotonically increasing?
A positive answer to the above would provide interesting generalizations of infra-Bayesianism. A negative answer to the above would provide an interesting novel justification of the maximin. Indeed, learnability is not a criterion that was ever used in axiomatic constructions of decision theory[3], AFAIK.
We can try considering discontinuous functions as well, but it seems natural to start with continuous. If we want the optimal policy to exist, we usually need D to be at least upper semicontinuous.
There are weaker conditions than “communicating” that are sufficient, e.g. “resettable” (meaning that the agent can always force returning to the initial state), and some even weaker conditions that I will not spell out here.
Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.
In the following, all infradistributions are crisp.
Fix finite action set A and finite observation set O. For any k∈N and γ∈(0,1), let
Mkγ:(A×O)ω→Δ(A×O)kbe defined by
Mkγ(h|d):=(1−γ)∞∑n=0γn[[h=dn:n+k]]In other words, this kernel samples a time step n out of the geometric distribution with parameter γ, and then produces the sequence of length k that appears in the destiny starting at n.
For any continuous[1] function D:□(A×O)k→R, we get a decision rule. Namely, this rule says that, given infra-Bayesian law Λ and discount parameter γ, the optimal policy is
π∗DΛ:=argmaxπ:O∗→AD(Mkγ∗Λ(π))The usual maximin is recovered when we have some reward function r:(A×O)k→R and corresponding to it is
Dr(Θ):=minθ∈ΘEθ[r]Given a set H of laws, it is said to be learnable w.r.t.D when there is a family of policies {πγ}γ∈(0,1) such that for any Λ∈H
limγ→1(maxπD(Mkγ∗Λ(π))−D(Mkγ∗Λ(πγ))=0For Dr we know that e.g. the set of all communicating[2] finite infra-RDPs is learnable. More generally, for any t∈[0,1] we have the learnable decision rule
Dtr:=tmaxθ∈ΘEθ[r]+(1−t)minθ∈ΘEθ[r]This is the “mesomism” I taked about before.
Also, any monotonically increasing D seems to be learnable, i.e. any D s.t. for Θ1⊆Θ2 we have D(Θ1)≤D(Θ2). For such decision rules, you can essentially assume that “nature” (i.e. whatever resolves the ambiguity of the infradistributions) is collaborative with the agent. These rules are not very interesting.
On the other hand, decision rules of the form Dr1+Dr2 are not learnable in general, and so are decision rules of the form Dr+D′ for D′ monotonically increasing.
Open Problem: Are there any learnable decision rules that are not mesomism or monotonically increasing?
A positive answer to the above would provide interesting generalizations of infra-Bayesianism. A negative answer to the above would provide an interesting novel justification of the maximin. Indeed, learnability is not a criterion that was ever used in axiomatic constructions of decision theory[3], AFAIK.
We can try considering discontinuous functions as well, but it seems natural to start with continuous. If we want the optimal policy to exist, we usually need D to be at least upper semicontinuous.
There are weaker conditions than “communicating” that are sufficient, e.g. “resettable” (meaning that the agent can always force returning to the initial state), and some even weaker conditions that I will not spell out here.
I mean theorems like VNM, Savage etc.