Telephone Theorem, Redundancy/Resampling, and Maxent for the math, Chaos for the concepts.
Thank you!
Just because something can be learned efficiently doesn’t mean it’s convergent for a wide variety of cognitive systems.
I believe that the relevant cognitive systems all look like learning algorithms for a prior of certain fairly specific type. I don’t know how this prior looks like, but it’s something very rich on the one hand and efficiently learnable on the other hand. So, if you showed that your formalism naturally produces priors that seem closer to that “holy grail prior”, in terms of richness/efficiency, compared to priors that we already know (e.g. MDPs with small number of states which are not rich enough, or the Solomonoff prior which is both statistically and computationally intractable), that would at least be evidence that you’re going in the right direction.
And even if such hypothesis classes couldn’t be learned efficiently in full generality, it would still be possible for a subset of that hypothesis class to be convergent for a wide variety of cognitive systems, in which case general properties of the hypothesis class would still apply to those systems’ cognition.
Hmm, I’m not sure what would it mean for a subset of a hypothesis class to be “convergent”.
The question we actually want here is “Is abstraction, as captured by John’s formalism, instrumentally convergent for a wide variety of cognitive systems?”.
That’s interesting, but I’m still not sure what it means exactly. Let’s say we take a reinforcement learner which a specific hypothesis class, such all MDPs of certain size, or some family of MDPs with low eluder dimension, or the actual AIXI. How would you determine whether your formalism is “instrumentally convergent” for each of those? Is there a rigorous way to state the question?
Thank you!
I believe that the relevant cognitive systems all look like learning algorithms for a prior of certain fairly specific type. I don’t know how this prior looks like, but it’s something very rich on the one hand and efficiently learnable on the other hand. So, if you showed that your formalism naturally produces priors that seem closer to that “holy grail prior”, in terms of richness/efficiency, compared to priors that we already know (e.g. MDPs with small number of states which are not rich enough, or the Solomonoff prior which is both statistically and computationally intractable), that would at least be evidence that you’re going in the right direction.
Hmm, I’m not sure what would it mean for a subset of a hypothesis class to be “convergent”.
That’s interesting, but I’m still not sure what it means exactly. Let’s say we take a reinforcement learner which a specific hypothesis class, such all MDPs of certain size, or some family of MDPs with low eluder dimension, or the actual AIXI. How would you determine whether your formalism is “instrumentally convergent” for each of those? Is there a rigorous way to state the question?