Epistemic status: some of these ideas only crystallized today, normally I would take at least a few days to process before posting to make sure there are no glaring holes in the reasoning, but I saw this thread and decided to reply since it’s topical.
Suppose that your imitator works by something akin to Bayesian inference with some sort of bounded simplicity prior (I think it’s true of transformers). In order for Bayesian inference to converge to exact imitation, you usually need realizability. Obviously today we don’t have realizability because the ANNs currently in use are not big enough to contain a brain, but we’re gradually getting closer there[1].
More precisely, as ANNs grow in size we’re approaching a regime I dubbed “pseudorealizability”: on the one hand, the brain is in the prior[2], one the other hand, its description complexity is pretty high and therefore its prior probability is pretty low. Moreover, a more sophisticated agent (e.g. infra-Bayesian RL / Turing RL / infra-Bayesian physicalist) would be able to use the rest of world as useful evidence to predict some features of the human brain (i.e. even though human brains are complex, they are not random, there are reasons they came to be the way they are if you understand the broader context e.g. evolutionary history). But, the latter kind of inference does not take the form of having a (non-mesa-optimizing) complete cartesian parsimonious model of the world in which brains are a particular piece, because (i) such a model would be too computationally expensive (non-realizability) and (ii) bridge rules add a lot of complexity.
Hence, the honest-imitation hypothesis is heavily penalized compared to hypotheses that are in themselves agents which are more “epistemically sophisticated” than the outer loop of the AI. Why agents rather than some kind of non-agentic epistemic engines? Because, IB and IBP suggest that, this level of epistemic sophistication requires some entanglement between epistemic rationality and instrumental rationality: in these frameworks, it is not possible to decouple the two entirely.
From the perspective of the outer loop, we can describe the situation as: “I woke up, expecting to see a world that is (i) simple and (ii) computationally cheap. At first glance, the world seemed like, not that. But, everything became clear when I realized that the world is generated by a relatively-simple-and-cheap ‘deity’ who made the world like this on purpose because it’s beneficial for it from its own strange epistemic vantage point.”
Coming back to the question of, when to expect the transition from imitation-upstread-of-optimization to imitation-downstream-of-optimization. By the above line of argument, we should expect this transition to happen before the AI succeeds at any task which requires reasoning at least as sophisticated as the kind of reasoning that allows inferring properties of human brains from understanding the broader context of the world. Unfortunately, I cannot atm cache this out into a concrete milestone, but (i) it seems very believable that current language models are not there and (ii) maybe if we think about it more, we can come up with such a milestone.
Cotra’s report is a relevant point of reference, even though “having as many parameters as the brain according to some way to count brain-parameters” is ofc not the same as “capable of representing something which approximates the brain up to an error term that behaves like random noise”.
Assuming the training protocol is sufficiently good at decoupling the brain from the surrounding (more complex) world and pointing the AI at only trying to imitate the brain.
Hence, the honest-imitation hypothesis is heavily penalized compared to hypotheses that are in themselves agents which are more “epistemically sophisticated” than the outer loop of the AI.
In a deep learning context, the latter hypothesis seems much more heavily favored when using a simplicity prior (since gradient descent is simple to specify) than a speed prior (since gradient descent takes a lot of computation). So as long as the compute costs of inference remain smaller than the compute costs of training, a speed prior seems more appropriate for evaluating how easily hypotheses can become more epistemically sophisticated than the outer loop.
Not quite sure what you’re saying here. Is the claim that speed penalties would help shift the balance against mesa-optimizers? This kind of solutions are worth looking into, but I’m not too optimistic about them atm. First, the mesa-optimizer probably won’t add a lot of overhead compared to the considerable complexity of emulating a brain. In particular, it need not work by anything like our own ML algorithms. So, if it’s possible to rule out mesa-optimizers like this, it would require a rather extreme penalty. Second, there are limits on how much you can shape the prior while still having feasible learning. And I suspect that such an extreme speed penalty would not cut it. Third, depending on the setup, an extreme speed penalty might harm generalization[1]. But we definitely need to understand it more rigorously.
The most appealing version is Christiano’s “minimal circuits”, but that only works for inputs of fixed size. It’s not so clear what’s the variable-input-size (“transformer”) version of that.
No, I wasn’t advocating adding a speed penalty, I was just pointing at a reason to think that a speed prior would give a more accurate answer to the question of “which is favored” than the bounded simplicity prior you’re assuming:
Suppose that your imitator works by something akin to Bayesian inference with some sort of bounded simplicity prior (I think it’s true of transformers)
But now I realise that I don’t understand why you think this is true of transformers. Could you explain? It seems to me that there are many very simple hypotheses which take a long time to calculate, and which transformers therefore can’t be representing.
The word “bounded” in “bounded simplicity prior” referred to bounded computational resources. A “bounded simplicity prior” is a prior which involves either a “hard” (i.e. some hypotheses are excluded) or a “soft” (i.e. some hypotheses are down-weighted) bound on computational resources (or both), and also inductive bias towards simplicity (specifically it should probably behave as ~ 2^{-description complexity}). For a concrete example, see the prior I described here (w/o any claim to originality).
This seems like a good thing to keep in mind, but also sounds too pessimistic about the ability of gradient descent to find inference algorithms that update more efficiently than gradient descent.
I do expect this to happen. The question is merely: what’s the best predictor of how hard it is to find inference algorithms more efficient effective than gradient descent? Is it whether those inference algorithms are more complex than gradient descent? Or is it whether those inference algorithms run for longer than gradient descent? Since gradient descent is very simple but takes a long time to run, my bet is the latter: there are many simple ways to convert compute to optimisation, but few compute-cheap ways to convert additional complexity to optimization.
Faster than gradient descent is not a selective pressure, at least if we’re considering typical ML training procedures. What is a selective pressure is regularization, which functions much more like a complexity prior than a speed prior.
So (again sticking to modern day ML as an example, if you have something else in mind that would be interesting) of course there will be a cutoff in terms of speed, excluding all algorithms that don’t fit into the neural net. But among algorithms that fit into the NN, the penalty on their speed will be entirely explainable as a consequence of regularization that e.g. favors circuits that depend on fewer parameters, and would therefore be faster after some optimization steps.
If examples of successful parameters were sparse and tended to just barely fit into the NN, then this speed cutoff will be very important. But in the present day we see that good parameters tend to be pretty thick on the ground, and you can fairly smoothly move around in parameter space to make different tradeoffs.
Here’s my stab at rephrasing this argument without reference to IB. Would appreciate corrections, and any pointers on where you think the IB formalism adds to the pre-theoretic intuitions:
At some point imitation will progress to the point where models use information about the world to infer properties of the thing they’re trying to imitate (humans) -- e.g. human brains were selected under some energy efficiency pressure, and so have certain properties. The relationship between “things humans are observed to say/respond to” to “how the world works” is extremely complex. Imitation-downstream-of-optimization is simpler. What’s more, imitation-downstream-of-optimization can be used to model (some of) the same things the brain-in-world strategy can. A speculative example: a model learns that humans use a bunch of different reasoning strategies (deductive reasoning, visual-memory search, analogizing...) and does a search over these strategies to see which one best fits the current context. This optimization-to-find-imitation is simpler than learning the evolutionary/cultural/educational world model which explains why the human uses one strategy over another in a given context.
Epistemic status: some of these ideas only crystallized today, normally I would take at least a few days to process before posting to make sure there are no glaring holes in the reasoning, but I saw this thread and decided to reply since it’s topical.
Suppose that your imitator works by something akin to Bayesian inference with some sort of bounded simplicity prior (I think it’s true of transformers). In order for Bayesian inference to converge to exact imitation, you usually need realizability. Obviously today we don’t have realizability because the ANNs currently in use are not big enough to contain a brain, but we’re gradually getting closer there[1].
More precisely, as ANNs grow in size we’re approaching a regime I dubbed “pseudorealizability”: on the one hand, the brain is in the prior[2], one the other hand, its description complexity is pretty high and therefore its prior probability is pretty low. Moreover, a more sophisticated agent (e.g. infra-Bayesian RL / Turing RL / infra-Bayesian physicalist) would be able to use the rest of world as useful evidence to predict some features of the human brain (i.e. even though human brains are complex, they are not random, there are reasons they came to be the way they are if you understand the broader context e.g. evolutionary history). But, the latter kind of inference does not take the form of having a (non-mesa-optimizing) complete cartesian parsimonious model of the world in which brains are a particular piece, because (i) such a model would be too computationally expensive (non-realizability) and (ii) bridge rules add a lot of complexity.
Hence, the honest-imitation hypothesis is heavily penalized compared to hypotheses that are in themselves agents which are more “epistemically sophisticated” than the outer loop of the AI. Why agents rather than some kind of non-agentic epistemic engines? Because, IB and IBP suggest that, this level of epistemic sophistication requires some entanglement between epistemic rationality and instrumental rationality: in these frameworks, it is not possible to decouple the two entirely.
From the perspective of the outer loop, we can describe the situation as: “I woke up, expecting to see a world that is (i) simple and (ii) computationally cheap. At first glance, the world seemed like, not that. But, everything became clear when I realized that the world is generated by a relatively-simple-and-cheap ‘deity’ who made the world like this on purpose because it’s beneficial for it from its own strange epistemic vantage point.”
Coming back to the question of, when to expect the transition from imitation-upstread-of-optimization to imitation-downstream-of-optimization. By the above line of argument, we should expect this transition to happen before the AI succeeds at any task which requires reasoning at least as sophisticated as the kind of reasoning that allows inferring properties of human brains from understanding the broader context of the world. Unfortunately, I cannot atm cache this out into a concrete milestone, but (i) it seems very believable that current language models are not there and (ii) maybe if we think about it more, we can come up with such a milestone.
Cotra’s report is a relevant point of reference, even though “having as many parameters as the brain according to some way to count brain-parameters” is ofc not the same as “capable of representing something which approximates the brain up to an error term that behaves like random noise”.
Assuming the training protocol is sufficiently good at decoupling the brain from the surrounding (more complex) world and pointing the AI at only trying to imitate the brain.
In a deep learning context, the latter hypothesis seems much more heavily favored when using a simplicity prior (since gradient descent is simple to specify) than a speed prior (since gradient descent takes a lot of computation). So as long as the compute costs of inference remain smaller than the compute costs of training, a speed prior seems more appropriate for evaluating how easily hypotheses can become more epistemically sophisticated than the outer loop.
Not quite sure what you’re saying here. Is the claim that speed penalties would help shift the balance against mesa-optimizers? This kind of solutions are worth looking into, but I’m not too optimistic about them atm. First, the mesa-optimizer probably won’t add a lot of overhead compared to the considerable complexity of emulating a brain. In particular, it need not work by anything like our own ML algorithms. So, if it’s possible to rule out mesa-optimizers like this, it would require a rather extreme penalty. Second, there are limits on how much you can shape the prior while still having feasible learning. And I suspect that such an extreme speed penalty would not cut it. Third, depending on the setup, an extreme speed penalty might harm generalization[1]. But we definitely need to understand it more rigorously.
The most appealing version is Christiano’s “minimal circuits”, but that only works for inputs of fixed size. It’s not so clear what’s the variable-input-size (“transformer”) version of that.
No, I wasn’t advocating adding a speed penalty, I was just pointing at a reason to think that a speed prior would give a more accurate answer to the question of “which is favored” than the bounded simplicity prior you’re assuming:
But now I realise that I don’t understand why you think this is true of transformers. Could you explain? It seems to me that there are many very simple hypotheses which take a long time to calculate, and which transformers therefore can’t be representing.
The word “bounded” in “bounded simplicity prior” referred to bounded computational resources. A “bounded simplicity prior” is a prior which involves either a “hard” (i.e. some hypotheses are excluded) or a “soft” (i.e. some hypotheses are down-weighted) bound on computational resources (or both), and also inductive bias towards simplicity (specifically it should probably behave as ~ 2^{-description complexity}). For a concrete example, see the prior I described here (w/o any claim to originality).
Ah, I see. That makes sense now!
This seems like a good thing to keep in mind, but also sounds too pessimistic about the ability of gradient descent to find inference algorithms that update more efficiently than gradient descent.
I do expect this to happen. The question is merely: what’s the best predictor of how hard it is to find inference algorithms more
efficienteffective than gradient descent? Is it whether those inference algorithms are more complex than gradient descent? Or is it whether those inference algorithms run for longer than gradient descent? Since gradient descent is very simple but takes a long time to run, my bet is the latter: there are many simple ways to convert compute to optimisation, but few compute-cheap ways to convert additional complexity to optimization.Faster than gradient descent is not a selective pressure, at least if we’re considering typical ML training procedures. What is a selective pressure is regularization, which functions much more like a complexity prior than a speed prior.
So (again sticking to modern day ML as an example, if you have something else in mind that would be interesting) of course there will be a cutoff in terms of speed, excluding all algorithms that don’t fit into the neural net. But among algorithms that fit into the NN, the penalty on their speed will be entirely explainable as a consequence of regularization that e.g. favors circuits that depend on fewer parameters, and would therefore be faster after some optimization steps.
If examples of successful parameters were sparse and tended to just barely fit into the NN, then this speed cutoff will be very important. But in the present day we see that good parameters tend to be pretty thick on the ground, and you can fairly smoothly move around in parameter space to make different tradeoffs.
Here’s my stab at rephrasing this argument without reference to IB. Would appreciate corrections, and any pointers on where you think the IB formalism adds to the pre-theoretic intuitions:
At some point imitation will progress to the point where models use information about the world to infer properties of the thing they’re trying to imitate (humans) -- e.g. human brains were selected under some energy efficiency pressure, and so have certain properties. The relationship between “things humans are observed to say/respond to” to “how the world works” is extremely complex. Imitation-downstream-of-optimization is simpler. What’s more, imitation-downstream-of-optimization can be used to model (some of) the same things the brain-in-world strategy can. A speculative example: a model learns that humans use a bunch of different reasoning strategies (deductive reasoning, visual-memory search, analogizing...) and does a search over these strategies to see which one best fits the current context. This optimization-to-find-imitation is simpler than learning the evolutionary/cultural/educational world model which explains why the human uses one strategy over another in a given context.