Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Vanessa Kosoy 24 Jul 2023 9:56 UTC
LW: 2 AF: 2
AF
Master post for ideas about metacognitive agents.
What links here?
- Video lectures on the learning-theoretic agenda by Vanessa Kosoy (27 Oct 2024 12:01 UTC; 67 points)
- Vanessa Kosoy 21 Apr 2024 14:43 UTC
  LW: 3 AF: 2
  0
  AF Parent
  Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to “plain” IBRL regret bounds when we consider the core and the envelope as the “inside” of the agent.
  Assume that the action and observation sets factor as $A = A_{0} \times A_{1}$ and $O = O_{0} \times O_{1}$ , where $(A_{0}, O_{0})$ is the interface with the external environment and $(A_{1}, O_{1})$ is the interface with the envelope.
  Let $Λ : Π \to □ (Γ \times (A \times O)^{ω})$ be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:
  - Marginalizing over $Γ$ . That is, let ${p r}_{- Γ} : Γ \times (A \times O)^{ω} \to (A \times O)^{ω}$ and ${p r}_{0} : (A \times O)^{ω} \to (A_{0} \times O_{0})^{ω}$ be the projections. Then, we have the law $Λ^{?} := ({p r}_{0} {p r}_{- Γ})_{*} \circ Λ$ .
  - Assuming “logical omniscience”. That is, let $τ^{*} \in Γ$ be the ground truth. Then, we have the law $Λ^{!} := {p r}_{0 *} (Λ ∣ τ^{*})$ . Here, we use the conditional defined by $Θ ∣ A := {θ ∣ A ∣ θ \in arg {max}_{Θ} Pr [A]}$ . It’s easy to see this indeed defines a law.
  However, requiring low regret w.r.t. neither of these is equivalent to low regret w.r.t $Λ$ :
  - Learning $Λ^{?}$ is typically no less feasible than learning $Λ$ , however it is a much weaker condition. This is because the metacognitive agents can use policies that query the envelope to get higher guaranteed expected utility.
  - Learning $Λ^{!}$ is a much stronger condition than learning $Λ$ , however it is typically infeasible. Requiring it leads to AIXI-like agents.
  Therefore, metacognitive regret bounds hit a “sweep spot” of stength vs. feasibility which produces a genuinely more powerful agents than IBRL^[1].
  1. ^
    More precisely, more powerful than IBRL with the usual sort of hypothesis classes (e.g. nicely structured crisp infra-RDP). In principle, we can reduce metacognitive regret bounds to IBRL regret bounds using non-crsip laws, since there’s a very general theorem for representing desiderata as laws. But, these laws would have a very peculiar form that seems impossible to guess without starting with metacognitive agents.
- Vanessa Kosoy 25 Mar 2024 1:27 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Formalizing the richness of mathematics
  Intuitively, it feels that there is something special about mathematical knowledge from a learning-theoretic perspective. Mathematics seems infinitely rich: no matter how much we learn, there is always more interesting structure to be discovered. Impossibility results like the halting problem and Godel incompleteness lend some credence to this intuition, but are insufficient to fully formalize it.
  Here is my proposal for how to formulate a theorem that would make this idea rigorous.
  (Wrong) First Attempt
  Fix some natural hypothesis class for mathematical knowledge, such as some variety of tree automata. Each such hypothesis $Θ$ represents an infradistribution over $Γ$ : the “space of counterpossible computational universes”. We can say that $Θ$ is a “true hypothesis” when there is some $θ$ in the credal set $Θ$ (a distribution over $Γ$ ) s.t. the ground truth $Υ^{*} \in Γ$ “looks” as if it’s sampled from $θ$ . The latter should be formalizable via something like a computationally bounded version of Marin-Lof randomness.
  We can now try to say that $Υ^{*}$ is “rich” if for any true hypothesis $Θ$ , there is a refinement $Ξ \subseteq Θ$ which is also a true hypothesis and “knows” at least one bit of information that $Θ$ doesn’t, in some sense. This is clearly true, since there can be no automaton or even any computable hypothesis which fully describes $Υ^{*}$ . But, it’s also completely boring: the required $Ξ$ can be constructed by “hardcoding” an additional fact into $Θ$ . This doesn’t look like “discovering interesting structure”, but rather just like brute-force memorization.
  (Wrong) Second Attempt
  What if instead we require that $Ξ$ knows infinitely many bits of information that $Θ$ doesn’t? This is already more interesting. Imagine that instead of metacognition / mathematics, we would be talking about ordinary sequence prediction. In this case it is indeed an interesting non-trivial condition that the sequence contains infinitely many regularities, s.t. each of them can be expressed by a finite automaton but their conjunction cannot. For example, maybe the $n$ -th bit in the sequence depends only the largest $k$ s.t. $2^{k}$ divides $n$ , but the dependence on $k$ is already uncomputable (or at least inexpressible by a finite automaton).
  However, for our original application, this is entirely insufficient. This is because in the formal language we use to define $Γ$ (e.g. combinator calculus) has some “easy” equivalence relations. For example, consider the family of programs of the form “if 2+2=4 then output 0, otherwise...”. All of those programs would output 0, which is obvious once you know that 2+2=4. Therefore, once your automaton is able to check some such easy equivalence relations, hardcoding a single new fact (in the example, 2+2=4) generates infinitely many “new” bits of information. Once again, we are left with brute-force memorization.
  (Less Wrong) Third Attempt
  Here’s the improved condition: For any true hypothesis $Θ$ , there is a true refinement $Ξ \subseteq Θ$ s.t. conditioning $Θ$ on any finite set of observations cannot produce a refinement of $Ξ$ .
  There is a technicality here, because we’re talking about infradistributions, so what is “conditioning” exactly? For credal sets, I think it is sufficient to allow two types of “conditioning”:
  - For any given observation $A$ and $p \in (0, 1]$ , we can form ${θ \in Θ ∣ θ (A) \geq p}$ .
  - For any given observation $A$ s.t. ${min}_{θ \in Θ} θ (A) > 0$ , we can form ${(θ ∣ A) ∣ θ \in Θ}$ .
  This rules-out the counterexample from before: the easy equivalence relation can be represented inside $Θ$ , and then the entire sequence of “novel” bits can be generated by a conditioning.
  Alright, so does $Υ^{*}$ actually satisfy this condition? I think it’s very probable, but I haven’t proved it yet.
- Vanessa Kosoy 4 Aug 2023 5:07 UTC
  LW: 2 AF: 2
  AF Parent
  Recording of a talk I gave in VAISU 2023.
  What links here?
  - Critical review of Christiano’s disagreements with Yudkowsky by Vanessa Kosoy (27 Dec 2023 16:02 UTC; 172 points)
  - Learning-theoretic agenda reading list by Vanessa Kosoy (9 Nov 2023 17:25 UTC; 98 points)
- Vanessa Kosoy 24 Jul 2023 11:07 UTC
  LW: 2 AF: 2
  AF Parent
  Here is the sketch of a simplified model for how a metacognitive agent deals with traps.
  Consider some (unlearnable) prior $ζ$ over environments, s.t. we can efficiently compute the distribution $ζ (h)$ over observations given any history $h$ . For example, any prior over a small set of MDP hypotheses would qualify. Now, for each $h$ , we regard $ζ (h)$ as a “program” that the agent can execute and form beliefs about. In particular, we have a “metaprior” $ξ$ consisting of metahypotheses: hypotheses-about-programs.
  For example, if we let every metahypothesis be a small infra-RDP satisfying appropriate assumptions, we probably have an efficient “metalearning” algorithm. More generally, we can allow a metahypothesis to be a learnable mixture of infra-RDPs: for instance, there is a finite state machine for specifying “safe” actions, and the infra-RDPs in the mixture guarantee no long-term loss upon taking safe actions.
  In this setting, there are two levels of learning algorithms:
  - The metalearning algorithm, which learns the correct infra-RDP mixture. The flavor of this algorithm is RL in a setting where we have a simulator of the environment (since we can evaluate $ζ (h)$ for any $h$ ). In particular, here we don’t worry about exploitation/exploration tradeoffs.
  - The “metacontrol” algorithm, which given an infra-RDP mixture, approximates the optimal policy. The flavor of this algorithm is “standard” RL with exploitation/exploration tradeoffs.
  In the simplest toy model, we can imagine that metalearning happens entirely in advance of actual interaction with the environment. More realistically, the two needs to happen in parallel. It is then natural to apply metalearning to the current environmental posterior rather than the prior (i.e. the histories starting from the history that already occurred). Such an agent satisfies “opportunistic” guarantees: if at any point of time, the posterior admits a useful metahypothesis, the agent can exploit this metahypothesis. Thus, we address both parts of the problem of traps:
  - The complexity-theoretic part (subproblem 1.2) is addressed by approximating the intractable Bayes-optimality problem by the metacontrol problem of the (coarser) metahypothesis.
  - The statistical part (subproblem 2.1) is addressed by opportunism: if at some point, we can easily learn something about the physical environment, then we do.

Vanessa Kosoy comments on Vanessa Kosoy’s Shortform

Formalizing the richness of mathematics

(Wrong) First Attempt

(Wrong) Second Attempt

(Less Wrong) Third Attempt