Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 11 Nov 2023 18:09 UTC
LW: 2 AF: 2
0
AF
Here is a way to construct many learnable undogmatic ontologies, including such with finite state spaces.
A deterministic partial environment (DPE) over action set $A$ and observation set $O$ is a pair $(D, ϕ)$ where $D \subseteq (O \times A)^{*}$ and $ϕ : D \to O$ s.t.
- If $h \in (O \times A)^{*}$ is a prefix of some $g \in D$ , then $h \in D$ .
- If $h, g \in D$ , $p \in O$ and $h p$ is a prefix of $g$ , then $ϕ (h) = p$ .
DPEs are equipped with a natural partial order. Namely, $(D, ϕ) \leq (E, ψ)$ when $D \subseteq E$ and $ϕ = ψ |_{D}$ .
Let $S$ be a strong upwards antichain in the DPE poset which doesn’t contain the bottom DPE (i.e. the DPE with $D = \emptyset$ ). Then, it naturally induces an infra-POMDP. Specifically:
- The state space is $S$ .
- The initial infradistribution is $⊤_{S}$ .
- The observation mapping is $ω (D, ϕ) := ϕ (ϵ)$ , where $ϵ$ is the empty history.
- The transition infrakernel is $T (D, ϕ; a) := ⊤_{N (D, ϕ; a)}$ , where
$N (D, ϕ; a) := {(E, ψ) \in S | \forall h \in (O \times A)^{*} : ϕ (ϵ) a h \in D ⟹ h \in E \land ψ (h) = ϕ (ϕ (ϵ) a h)}$
If $N (D, ϕ; a)$ is non-empty for all $(D, ϕ) \in S$ and $a \in A$ , this is a learnable undogmatic ontology.
Any $n \in N$ yields an example $S_{n}$ . Namely, $(D, ϕ) \in S_{n}$ iff $D \neq \emptyset$ and for any $h \in D$ it holds that:
1. $| h | \leq n$
2. If $| h | < n$ then for any $a \in A$ , $h a ϕ (a) \in D$ .
I think that for ~~any continuous~~ some non-trivial hidden reward functions over such an ontology, the class of communicating RUMDPs is learnable. If the hidden reward function doesn’t depend on the action argument, it’s equivalent to some instrumental reward function.
What links here?
- The Learning-Theoretic Agenda: Status 2023 by Vanessa Kosoy (19 Apr 2023 5:21 UTC; 144 points)
- Vanessa Kosoy's comment on The Learning-Theoretic Agenda: Status 2023 by Vanessa Kosoy (8 Dec 2024 13:38 UTC; 11 points)