Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 9 Apr 2024 11:56 UTC
LW: 7 AF: 5
0
AF
Is it possible to replace the maximin decision rule in infra-Bayesianism with a different decision rule? One surprisingly strong desideratum for such decision rules is the learnability of some natural hypothesis classes.
In the following, all infradistributions are crisp.
Fix finite action set $A$ and finite observation set $O$ . For any $k \in N$ and $γ \in (0, 1)$ , let
$M_{γ}^{k} : (A \times O)^{ω} \to Δ (A \times O)^{k}$
be defined by
$M_{γ}^{k} (h | d) := (1 - γ) \infty \sum n = 0 γ^{n} [[h = d_{n : n + k}]]$
In other words, this kernel samples a time step $n$ out of the geometric distribution with parameter $γ$ , and then produces the sequence of length $k$ that appears in the destiny starting at $n$ .
For any continuous^[1] function $D : □ (A \times O)^{k} \to R$ , we get a decision rule. Namely, this rule says that, given infra-Bayesian law $Λ$ and discount parameter $γ$ , the optimal policy is
$π_{D Λ}^{*} := arg max π : O^{*} \to A D (M_{γ *}^{k} Λ (π))$
The usual maximin is recovered when we have some reward function $r : (A \times O)^{k} \to R$ and corresponding to it is
$D_{r} (Θ) := min θ \in Θ E_{θ} [r]$
Given a set $H$ of laws, it is said to be learnable w.r.t. $D$ when there is a family of policies ${π_{γ}}_{γ \in (0, 1)}$ such that for any $Λ \in H$
$lim γ \to 1 (max π D (M_{γ *}^{k} Λ (π)) - D (M_{γ *}^{k} Λ (π_{γ})) = 0$
For $D_{r}$ we know that e.g. the set of all communicating^[2] finite infra-RDPs is learnable. More generally, for any $t \in [0, 1]$ we have the learnable decision rule
$D_{r}^{t} := t max θ \in Θ E_{θ} [r] + (1 - t) min θ \in Θ E_{θ} [r]$
This is the “mesomism” I taked about before.
Also, any monotonically increasing $D$ seems to be learnable, i.e. any $D$ s.t. for $Θ_{1} \subseteq Θ_{2}$ we have $D (Θ_{1}) \leq D (Θ_{2})$ . For such decision rules, you can essentially assume that “nature” (i.e. whatever resolves the ambiguity of the infradistributions) is collaborative with the agent. These rules are not very interesting.
On the other hand, decision rules of the form $D_{r_{1}} + D_{r_{2}}$ are not learnable in general, and so are decision rules of the form $D_{r} + D^{'}$ for $D^{'}$ monotonically increasing.
Open Problem: Are there any learnable decision rules that are not mesomism or monotonically increasing?
A positive answer to the above would provide interesting generalizations of infra-Bayesianism. A negative answer to the above would provide an interesting novel justification of the maximin. Indeed, learnability is not a criterion that was ever used in axiomatic constructions of decision theory^[3], AFAIK.
1. ^
  We can try considering discontinuous functions as well, but it seems natural to start with continuous. If we want the optimal policy to exist, we usually need $D$ to be at least upper semicontinuous.
2. ^
  There are weaker conditions than “communicating” that are sufficient, e.g. “resettable” (meaning that the agent can always force returning to the initial state), and some even weaker conditions that I will not spell out here.
3. ^
  I mean theorems like VNM, Savage etc.
What links here?