Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to “plain” IBRL regret bounds when we consider the core and the envelope as the “inside” of the agent.
Assume that the action and observation sets factor as A=A0×A1 and O=O0×O1, where (A0,O0) is the interface with the external environment and (A1,O1) is the interface with the envelope.
Let Λ:Π→□(Γ×(A×O)ω) be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:
Marginalizing over Γ. That is, let pr−Γ:Γ×(A×O)ω→(A×O)ω and pr0:(A×O)ω→(A0×O0)ω be the projections. Then, we have the law Λ?:=(pr0pr−Γ)∗∘Λ.
Assuming “logical omniscience”. That is, let τ∗∈Γ be the ground truth. Then, we have the law Λ!:=pr0∗(Λ∣τ∗). Here, we use the conditional defined by Θ∣A:={θ∣A∣θ∈argmaxΘPr[A]}. It’s easy to see this indeed defines a law.
However, requiring low regret w.r.t. neither of these is equivalent to low regret w.r.t Λ:
Learning Λ? is typically no less feasible than learning Λ, however it is a much weaker condition. This is because the metacognitive agents can use policies that query the envelope to get higher guaranteed expected utility.
Learning Λ! is a much stronger condition than learning Λ, however it is typically infeasible. Requiring it leads to AIXI-like agents.
Therefore, metacognitive regret bounds hit a “sweep spot” of stength vs. feasibility which produces a genuinely more powerful agents than IBRL[1].
More precisely, more powerful than IBRL with the usual sort of hypothesis classes (e.g. nicely structured crisp infra-RDP). In principle, we can reduce metacognitive regret bounds to IBRL regret bounds using non-crsip laws, since there’s a very general theorem for representing desiderata as laws. But, these laws would have a very peculiar form that seems impossible to guess without starting with metacognitive agents.
Intuitively, it feels that there is something special about mathematical knowledge from a learning-theoretic perspective. Mathematics seems infinitely rich: no matter how much we learn, there is always more interesting structure to be discovered. Impossibility results like the halting problem and Godel incompleteness lend some credence to this intuition, but are insufficient to fully formalize it.
Here is my proposal for how to formulate a theorem that would make this idea rigorous.
(Wrong) First Attempt
Fix some natural hypothesis class for mathematical knowledge, such as some variety of tree automata. Each such hypothesis Θ represents an infradistribution over Γ: the “space of counterpossible computational universes”. We can say that Θ is a “true hypothesis” when there is some θ in the credal set Θ (a distribution over Γ) s.t. the ground truth Υ∗∈Γ “looks” as if it’s sampled from θ. The latter should be formalizable via something like a computationally bounded version of Marin-Lof randomness.
We can now try to say that Υ∗ is “rich” if for any true hypothesis Θ, there is a refinement Ξ⊆Θ which is also a true hypothesis and “knows” at least one bit of information that Θ doesn’t, in some sense. This is clearly true, since there can be no automaton or even any computable hypothesis which fully describes Υ∗. But, it’s also completely boring: the required Ξ can be constructed by “hardcoding” an additional fact into Θ. This doesn’t look like “discovering interesting structure”, but rather just like brute-force memorization.
(Wrong) Second Attempt
What if instead we require that Ξ knows infinitely many bits of information that Θ doesn’t? This is already more interesting. Imagine that instead of metacognition / mathematics, we would be talking about ordinary sequence prediction. In this case it is indeed an interesting non-trivial condition that the sequence contains infinitely many regularities, s.t. each of them can be expressed by a finite automaton but their conjunction cannot. For example, maybe the n-th bit in the sequence depends only the largest k s.t.2k divides n, but the dependence on k is already uncomputable (or at least inexpressible by a finite automaton).
However, for our original application, this is entirely insufficient. This is because in the formal language we use to define Γ (e.g. combinator calculus) has some “easy” equivalence relations. For example, consider the family of programs of the form “if 2+2=4 then output 0, otherwise...”. All of those programs would output 0, which is obvious once you know that 2+2=4. Therefore, once your automaton is able to check some such easy equivalence relations, hardcoding a single new fact (in the example, 2+2=4) generates infinitely many “new” bits of information. Once again, we are left with brute-force memorization.
(Less Wrong) Third Attempt
Here’s the improved condition: For any true hypothesis Θ, there is a true refinement Ξ⊆Θ s.t. conditioning Θon any finite set of observations cannot produce a refinement ofΞ.
There is a technicality here, because we’re talking about infradistributions, so what is “conditioning” exactly? For credal sets, I think it is sufficient to allow two types of “conditioning”:
For any given observation A and p∈(0,1], we can form {θ∈Θ∣θ(A)≥p}.
For any given observation A s.t. minθ∈Θθ(A)>0, we can form {(θ∣A)∣θ∈Θ}.
This rules-out the counterexample from before: the easy equivalence relation can be represented inside Θ, and then the entire sequence of “novel” bits can be generated by a conditioning.
Alright, so does Υ∗ actually satisfy this condition? I think it’s very probable, but I haven’t proved it yet.
Here is the sketch of a simplified model for how a metacognitive agent deals with traps.
Consider some (unlearnable) prior ζ over environments, s.t. we can efficiently compute the distribution ζ(h) over observations given any history h. For example, any prior over a small set of MDP hypotheses would qualify. Now, for each h, we regard ζ(h) as a “program” that the agent can execute and form beliefs about. In particular, we have a “metaprior” ξ consisting of metahypotheses: hypotheses-about-programs.
For example, if we let every metahypothesis be a small infra-RDP satisfying appropriate assumptions, we probably have an efficient “metalearning” algorithm. More generally, we can allow a metahypothesis to be a learnable mixture of infra-RDPs: for instance, there is a finite state machine for specifying “safe” actions, and the infra-RDPs in the mixture guarantee no long-term loss upon taking safe actions.
In this setting, there are two levels of learning algorithms:
The metalearning algorithm, which learns the correct infra-RDP mixture. The flavor of this algorithm is RL in a setting where we have a simulator of the environment (since we can evaluate ζ(h) for any h). In particular, here we don’t worry about exploitation/exploration tradeoffs.
The “metacontrol” algorithm, which given an infra-RDP mixture, approximates the optimal policy. The flavor of this algorithm is “standard” RL with exploitation/exploration tradeoffs.
In the simplest toy model, we can imagine that metalearning happens entirely in advance of actual interaction with the environment. More realistically, the two needs to happen in parallel. It is then natural to apply metalearning to the current environmental posterior rather than the prior (i.e. the histories starting from the history that already occurred). Such an agent satisfies “opportunistic” guarantees: if at any point of time, the posterior admits a useful metahypothesis, the agent can exploit this metahypothesis. Thus, we address both parts of the problem of traps:
The complexity-theoretic part (subproblem 1.2) is addressed by approximating the intractable Bayes-optimality problem by the metacontrol problem of the (coarser) metahypothesis.
The statistical part (subproblem 2.1) is addressed by opportunism: if at some point, we can easily learn something about the physical environment, then we do.
Master post for ideas about metacognitive agents.
Sort of obvious but good to keep in mind: Metacognitive regret bounds are not easily reducible to “plain” IBRL regret bounds when we consider the core and the envelope as the “inside” of the agent.
Assume that the action and observation sets factor as A=A0×A1 and O=O0×O1, where (A0,O0) is the interface with the external environment and (A1,O1) is the interface with the envelope.
Let Λ:Π→□(Γ×(A×O)ω) be a metalaw. Then, there are two natural ways to reduce it to an ordinary law:
Marginalizing over Γ. That is, let pr−Γ:Γ×(A×O)ω→(A×O)ω and pr0:(A×O)ω→(A0×O0)ω be the projections. Then, we have the law Λ?:=(pr0pr−Γ)∗∘Λ.
Assuming “logical omniscience”. That is, let τ∗∈Γ be the ground truth. Then, we have the law Λ!:=pr0∗(Λ∣τ∗). Here, we use the conditional defined by Θ∣A:={θ∣A∣θ∈argmaxΘPr[A]}. It’s easy to see this indeed defines a law.
However, requiring low regret w.r.t. neither of these is equivalent to low regret w.r.t Λ:
Learning Λ? is typically no less feasible than learning Λ, however it is a much weaker condition. This is because the metacognitive agents can use policies that query the envelope to get higher guaranteed expected utility.
Learning Λ! is a much stronger condition than learning Λ, however it is typically infeasible. Requiring it leads to AIXI-like agents.
Therefore, metacognitive regret bounds hit a “sweep spot” of stength vs. feasibility which produces a genuinely more powerful agents than IBRL[1].
More precisely, more powerful than IBRL with the usual sort of hypothesis classes (e.g. nicely structured crisp infra-RDP). In principle, we can reduce metacognitive regret bounds to IBRL regret bounds using non-crsip laws, since there’s a very general theorem for representing desiderata as laws. But, these laws would have a very peculiar form that seems impossible to guess without starting with metacognitive agents.
Formalizing the richness of mathematics
Intuitively, it feels that there is something special about mathematical knowledge from a learning-theoretic perspective. Mathematics seems infinitely rich: no matter how much we learn, there is always more interesting structure to be discovered. Impossibility results like the halting problem and Godel incompleteness lend some credence to this intuition, but are insufficient to fully formalize it.
Here is my proposal for how to formulate a theorem that would make this idea rigorous.
(Wrong) First Attempt
Fix some natural hypothesis class for mathematical knowledge, such as some variety of tree automata. Each such hypothesis Θ represents an infradistribution over Γ: the “space of counterpossible computational universes”. We can say that Θ is a “true hypothesis” when there is some θ in the credal set Θ (a distribution over Γ) s.t. the ground truth Υ∗∈Γ “looks” as if it’s sampled from θ. The latter should be formalizable via something like a computationally bounded version of Marin-Lof randomness.
We can now try to say that Υ∗ is “rich” if for any true hypothesis Θ, there is a refinement Ξ⊆Θ which is also a true hypothesis and “knows” at least one bit of information that Θ doesn’t, in some sense. This is clearly true, since there can be no automaton or even any computable hypothesis which fully describes Υ∗. But, it’s also completely boring: the required Ξ can be constructed by “hardcoding” an additional fact into Θ. This doesn’t look like “discovering interesting structure”, but rather just like brute-force memorization.
(Wrong) Second Attempt
What if instead we require that Ξ knows infinitely many bits of information that Θ doesn’t? This is already more interesting. Imagine that instead of metacognition / mathematics, we would be talking about ordinary sequence prediction. In this case it is indeed an interesting non-trivial condition that the sequence contains infinitely many regularities, s.t. each of them can be expressed by a finite automaton but their conjunction cannot. For example, maybe the n-th bit in the sequence depends only the largest k s.t.2k divides n, but the dependence on k is already uncomputable (or at least inexpressible by a finite automaton).
However, for our original application, this is entirely insufficient. This is because in the formal language we use to define Γ (e.g. combinator calculus) has some “easy” equivalence relations. For example, consider the family of programs of the form “if 2+2=4 then output 0, otherwise...”. All of those programs would output 0, which is obvious once you know that 2+2=4. Therefore, once your automaton is able to check some such easy equivalence relations, hardcoding a single new fact (in the example, 2+2=4) generates infinitely many “new” bits of information. Once again, we are left with brute-force memorization.
(Less Wrong) Third Attempt
Here’s the improved condition: For any true hypothesis Θ, there is a true refinement Ξ⊆Θ s.t. conditioning Θ on any finite set of observations cannot produce a refinement of Ξ.
There is a technicality here, because we’re talking about infradistributions, so what is “conditioning” exactly? For credal sets, I think it is sufficient to allow two types of “conditioning”:
For any given observation A and p∈(0,1], we can form {θ∈Θ∣θ(A)≥p}.
For any given observation A s.t. minθ∈Θθ(A)>0, we can form {(θ∣A)∣θ∈Θ}.
This rules-out the counterexample from before: the easy equivalence relation can be represented inside Θ, and then the entire sequence of “novel” bits can be generated by a conditioning.
Alright, so does Υ∗ actually satisfy this condition? I think it’s very probable, but I haven’t proved it yet.
Recording of a talk I gave in VAISU 2023.
Here is the sketch of a simplified model for how a metacognitive agent deals with traps.
Consider some (unlearnable) prior ζ over environments, s.t. we can efficiently compute the distribution ζ(h) over observations given any history h. For example, any prior over a small set of MDP hypotheses would qualify. Now, for each h, we regard ζ(h) as a “program” that the agent can execute and form beliefs about. In particular, we have a “metaprior” ξ consisting of metahypotheses: hypotheses-about-programs.
For example, if we let every metahypothesis be a small infra-RDP satisfying appropriate assumptions, we probably have an efficient “metalearning” algorithm. More generally, we can allow a metahypothesis to be a learnable mixture of infra-RDPs: for instance, there is a finite state machine for specifying “safe” actions, and the infra-RDPs in the mixture guarantee no long-term loss upon taking safe actions.
In this setting, there are two levels of learning algorithms:
The metalearning algorithm, which learns the correct infra-RDP mixture. The flavor of this algorithm is RL in a setting where we have a simulator of the environment (since we can evaluate ζ(h) for any h). In particular, here we don’t worry about exploitation/exploration tradeoffs.
The “metacontrol” algorithm, which given an infra-RDP mixture, approximates the optimal policy. The flavor of this algorithm is “standard” RL with exploitation/exploration tradeoffs.
In the simplest toy model, we can imagine that metalearning happens entirely in advance of actual interaction with the environment. More realistically, the two needs to happen in parallel. It is then natural to apply metalearning to the current environmental posterior rather than the prior (i.e. the histories starting from the history that already occurred). Such an agent satisfies “opportunistic” guarantees: if at any point of time, the posterior admits a useful metahypothesis, the agent can exploit this metahypothesis. Thus, we address both parts of the problem of traps:
The complexity-theoretic part (subproblem 1.2) is addressed by approximating the intractable Bayes-optimality problem by the metacontrol problem of the (coarser) metahypothesis.
The statistical part (subproblem 2.1) is addressed by opportunism: if at some point, we can easily learn something about the physical environment, then we do.