I’m only now really learning about Solomonoff induction. I think I didn’t look into it earlier since I often heard things along the lines of “It’s not computable, so it’s not relevant”.
But...
It’s lower semicomputable: You can actually approximate it arbitrarily well, you just don’t know how good your approximations are.
It predicts well: It’s provably a really good predictor under the reasonable assumption of a computable world.
It’s how science works: You focus on simple hypotheses and discard/reweight them according to Bayesian reasoning.
Unfortunately, I don’t think that “this is how science works” is really true. Science focuses on having a simple description of the world, while Solomonoff induction focuses on the description of the world plus your place in it, being simple.
I was enamored with Solomonoff induction too, but encountered more and more problems with it over time, that AFAIK nobody has made much progress on. So my answer to “what more do you want” is solutions to these (and other) problems, or otherwise dissolving my confusions about them.
Some degree of real-life applicability. If your mathematically precise framework nonetheless requires way more computing power than is available around you (or, in some cases, in the entire observable universe) to approximate it properly, you have a serious practical issue.
It’s how science works: You focus on simple hypotheses and discard/reweight them according to Bayesian reasoning.
The percentage of scientists I know who use explicit Bayesian updating[1] to reweigh hypotheses is a flat 0%. They use Occam’s razor-type intuitions, and those intuitions can be formalized using Solomonoff induction,[2] but that doesn’t mean they are using the latter.
reasonable assumption of a computable world
Reasonable according to what? Substance-free vibes from the Sequences? The map is not the territory. A simplifying mathematical description need not represent the ontologically correct way of identifying something in the territory.
It predicts well: It’s provenly a really good predictor
So can you point to any example of anyone ever predicting anything using it?
Except those intuitions are about science in the real world, and Solomonoff induction requires computability, and even if you approximate it it requires so much computing power that… oh, hey, same objection as before!
I think we may not disagree about any truth-claims about the world. I’m just satisfied that the north star of Solomonoff induction exists at all, and that it is as computable (albeit only semicomputable), well-predicting, science-compatible and precise as it is. I expected less from a theory that seems so unpopular.
> It predicts well: It’s provenly a really good predictor
So can you point to any example of anyone ever predicting anything using it?
No, but crucially, I’ve also never seen anyone predict as well as someone using Solomonoff induction with any other method :)
Also, there’s actually a decent argument that LLMs can be viewed as approximating something like Solomonoff induction. For instance my ARENA final project studied the ability of LLMs to approximate Solomonoff induction with pretty good results.
Lately there has been some (still limited) empirical success pretraining transformers on program outputs or some such inspired directly by Solomonoff induction—see “universal pretraining”
It’s how science works: You focus on simple hypotheses and discard/reweight them according to Bayesian reasoning.
There are some ways in which solomonoff induction and science are analogous[1], but there are also many important ways in which they are disanalogous. Here are some ways in which they are disanalogous:
A scientific theory is much less like a program that prints (or predicts) an observation sequence than it is like a theory in the sense used in logic. Like, a scientific theory provides a system of talking which involves some sorts of things (eg massive objects) about which some questions can be asked (eg each object has a position and a mass, and between any pair of objects there is a gravitational force) with some relations between the answers to these questions (eg we have an axiom specifying how the gravitational force depends on the positions and masses, and an axiom specifying how the second derivative of the position relates to the force).[2]
Science is less in the business of predicting arbitrary observation sequences, and much more in the business of letting one [figure out]/understand/exploit very particular things — like, the physics someone knows is going to be of limited help when they try to predict the time sequence of intensities of pixel (x,y) on their laptop screen, but it is going to help them a lot when solving the kinds of problems that would show up in a physics textbook.
Even for solving problems that a theory is supposed to help one solve (and for the predictions it is supposed to help one make), a scientific theory is highly incomplete — in addition to the letter of the theory, a human solving the problems in a classical mechanics textbook will be majorly relying on tacit understanding gained from learning classical mechanics and their common-sense understanding.
Making scientific progress looks less like picking out a correct hypothesis from some set of pre-well-specified hypotheses by updating on data, and much more like coming up with a decent way to think about something where there previously wasn’t one. E.g. it could look like Faraday staring at metallic filings near a magnet and starting to talk about the lines he was seeing, or Lorentz, Poincaré, and Einstein making sense of the result of the Michelson-Morley experiment. Imo the bayesian conception basically completely fails to model gaining scientific understanding.
Scientific progress also importantly involves inventing new things/phenomena to study. E.g., it would have been difficult to find things that Kirchhoff’s laws could help us with before we invented electric circuits; ditto for lens optics and lenses).
Idk, there is just very much to be said about the structure of science and scientific progress that doesn’t show up in the solomonoff picture (or maaaybe at best in some cases shows up inexplicitly inside the inductor). I’ll mention a few more things off the top of my head:
mathematical progress (e.g. coming up with the notion of a derivative)
having a sense of which things are useful/interesting to understand
generally, a human scientific community doing science has a bunch of interesting structure; in particular, the human minds participating in it have a bunch of interesting structure; one in fact needs a bunch of interesting structure to do science well; in fact, more structure of various kinds is gained when making scientific progress; basically none of this is anywhere to be seen in solomonoff induction
To be clear: I don’t intend this as a full description of the character of a scientific theory — e.g., I haven’t discussed how it gets related to something practical/concrete like action (or maybe (specifically) prediction). A scientific theory and a theory-in-the-sense-used-in-logic are ultimately also disanalogous in various ways — I’m only claiming it’s a better analogy than that between a scientific theory and a predictive model.
Scientific epistemology has a distinction between realism and instrumentalism. According to realism, a theory tells you what kind of entities do and do not exist. According to instrumentalism, a theory is restricted to predicting observations. If a theory is empirically adequate, if it makes only correct predictions within its domain, that’s good enough for instrumentalists. But the realist is faced with the problem that multiple theories can make good predictions, yet imply different ontologies, and one ontology can be ultimately correct, so some criterion beyond empirical adequacy is needed.
On the face of it, Solomonoff Inductors contain computer programmes, not explanations, not hypotheses and not descriptions. (I am grouping explanations, hypotheses and beliefs as things which have a semantic interpretation, which say something about reality . In particular, physics has a semantic interpretation in a way that maths does not.)
The Yukdowskian version of Solomonoff switches from talking about programs to talking about hypotheses as if they are obviously equivalent. Is it obvious? There’s a vague and loose sense in which physical theories “are” maths, and computer programs “are” maths, and so on. But there are many difficulties in the details. Not all maths is computable. Neither mathematical equations not computer programmes contain straightforward ontological assertions like “electrons exist”. The question of how to interpret physical equations is difficult and vexed. And a Solomonoff inductor contains programmes, not typical physics equations. whatever problems there are in interpreting maths ontologically are compounded when you have the additional stage of inferring maths from programmes.
In physics, the meanings of the symbols are taught to students, rather than being discovered in the maths. Students are taught that the “f” in f=ma, f is force,”m* is mass and “a” is acceleration. The equation itself , as pure maths, does not determine the meaning. For instance it has the same mathematical form as P=IV, which “means” something different. Physics and maths are not the same subject, and the fact that physics has a real-world semantics is one of the differences.
Similarly, the instructions in a programme have semantics related to programme operations, but not to the outside world. Machine code instructions do things like “Add 1 to register A”. You would have to look at thousands or millions of such low level instructions to infer what kind of kind high level maths—vector spaces , or non Euclidean geometry—the programme is executing.
It’s how science works: You focus on simple hypotheses and discard/reweight them according to Bayesian reasoning
It’s not how science works, because science doesn’t generate hypotheses mechanically, and doesn’t attempt to brute-force-search them.
Relevance to bounded agents like us, and not being sensitive to an arbitrary choice of language. More on the latter (h/t Jesse Clifton):
The problem is that Kolmogorov complexity depends on the language in which algorithms are described. Whatever you want to say about invariances with respect to the description language, this has the following unfortunate consequence for agents making decisions on the basis of finite amounts of data: For any finite sequence of observations, we can always find a silly-looking language in which the length of the shortest program outputting those observations is much lower than that in a natural-looking language (but which makes wildly different predictions of future data). For example, we can find a silly-looking language in which “the laws of physics have been as you think they are ’til now, but tomorrow all emeralds will turn blue” is simpler than “all emeralds will stay green and the laws of physics will keep working”...
You might say, “Well we shouldn’t use those languages because they’re silly!” But what are the principles by which you decide a language is silly? We would suggest that you start with the actual metaphysical content of the theories under consideration, the claims they make about how the world is, rather than the mere syntax of a theory in some language.
It definitely seems worth knowing about and understanding, but stuff like needing to specify a universal turing machine does still give me pause. It doesn’t make it uninsightful, but I do still think there is more work to do to really understand induction.
Agreed. The primary thing Solomonoff induction doesn’t take into account is computational complexity/ compute. But… you can simply include a reasonable time-penalty and most of the results mostly go through. It becomes a bit more like logical inductors.
Solomonoff induction also dovetails (hah) nicely with the fact that next-token prediction was all you need for intelligence.[1]
If logical inductors is what one wants, just do that.
a reasonable time-penalty
I’m not entirely sure, but I suspect that I don’t want any time penalty in my (typical human) prior. E.g. even if quantum mechanics takes non-polynomial time to simulate, I still think it a likely hypothesis. Time penalty just doesn’t seem to be related to what I pay attention to when I access my prior for the laws of physics / fundamental hypotheses. There’s also many other ideas for augmenting a simplicity prior that fail similar tests.
I’m only now really learning about Solomonoff induction. I think I didn’t look into it earlier since I often heard things along the lines of “It’s not computable, so it’s not relevant”.
But...
It’s lower semicomputable: You can actually approximate it arbitrarily well, you just don’t know how good your approximations are.
It predicts well: It’s provably a really good predictor under the reasonable assumption of a computable world.
It’s how science works: You focus on simple hypotheses and discard/reweight them according to Bayesian reasoning.
It’s mathematically precise.
What more do you want?
The fact that my master’s degree in AI at the UvA didn’t teach this to us seems like a huge failure.
Unfortunately, I don’t think that “this is how science works” is really true. Science focuses on having a simple description of the world, while Solomonoff induction focuses on the description of the world plus your place in it, being simple.
This leads to some really weird consequences, which people sometimes refer to as the Solomonoff induction being malign.
I was enamored with Solomonoff induction too, but encountered more and more problems with it over time, that AFAIK nobody has made much progress on. So my answer to “what more do you want” is solutions to these (and other) problems, or otherwise dissolving my confusions about them.
Some degree of real-life applicability. If your mathematically precise framework nonetheless requires way more computing power than is available around you (or, in some cases, in the entire observable universe) to approximate it properly, you have a serious practical issue.
The percentage of scientists I know who use explicit Bayesian updating[1] to reweigh hypotheses is a flat 0%. They use Occam’s razor-type intuitions, and those intuitions can be formalized using Solomonoff induction,[2] but that doesn’t mean they are using the latter.
Reasonable according to what? Substance-free vibes from the Sequences? The map is not the territory. A simplifying mathematical description need not represent the ontologically correct way of identifying something in the territory.
So can you point to any example of anyone ever predicting anything using it?
Or universal Turing Machines to compute the description lenghts of programs meant to represent real-world hypotheses
Except those intuitions are about science in the real world, and Solomonoff induction requires computability, and even if you approximate it it requires so much computing power that… oh, hey, same objection as before!
I think we may not disagree about any truth-claims about the world. I’m just satisfied that the north star of Solomonoff induction exists at all, and that it is as computable (albeit only semicomputable), well-predicting, science-compatible and precise as it is. I expected less from a theory that seems so unpopular.
No, but crucially, I’ve also never seen anyone predict as well as someone using Solomonoff induction with any other method :)
Also, there’s actually a decent argument that LLMs can be viewed as approximating something like Solomonoff induction. For instance my ARENA final project studied the ability of LLMs to approximate Solomonoff induction with pretty good results.
Lately there has been some (still limited) empirical success pretraining transformers on program outputs or some such inspired directly by Solomonoff induction—see “universal pretraining”
There are some ways in which solomonoff induction and science are analogous[1], but there are also many important ways in which they are disanalogous. Here are some ways in which they are disanalogous:
A scientific theory is much less like a program that prints (or predicts) an observation sequence than it is like a theory in the sense used in logic. Like, a scientific theory provides a system of talking which involves some sorts of things (eg massive objects) about which some questions can be asked (eg each object has a position and a mass, and between any pair of objects there is a gravitational force) with some relations between the answers to these questions (eg we have an axiom specifying how the gravitational force depends on the positions and masses, and an axiom specifying how the second derivative of the position relates to the force).[2]
Science is less in the business of predicting arbitrary observation sequences, and much more in the business of letting one [figure out]/understand/exploit very particular things — like, the physics someone knows is going to be of limited help when they try to predict the time sequence of intensities of pixel (x,y) on their laptop screen, but it is going to help them a lot when solving the kinds of problems that would show up in a physics textbook.
Even for solving problems that a theory is supposed to help one solve (and for the predictions it is supposed to help one make), a scientific theory is highly incomplete — in addition to the letter of the theory, a human solving the problems in a classical mechanics textbook will be majorly relying on tacit understanding gained from learning classical mechanics and their common-sense understanding.
Making scientific progress looks less like picking out a correct hypothesis from some set of pre-well-specified hypotheses by updating on data, and much more like coming up with a decent way to think about something where there previously wasn’t one. E.g. it could look like Faraday staring at metallic filings near a magnet and starting to talk about the lines he was seeing, or Lorentz, Poincaré, and Einstein making sense of the result of the Michelson-Morley experiment. Imo the bayesian conception basically completely fails to model gaining scientific understanding.
Scientific theories are often created to do something — I mean: to do something other than predicting some existing data — e.g., to make something; e.g., see https://en.wikipedia.org/wiki/History_of_thermodynamics.
Scientific progress also importantly involves inventing new things/phenomena to study. E.g., it would have been difficult to find things that Kirchhoff’s laws could help us with before we invented electric circuits; ditto for lens optics and lenses).
Idk, there is just very much to be said about the structure of science and scientific progress that doesn’t show up in the solomonoff picture (or maaaybe at best in some cases shows up inexplicitly inside the inductor). I’ll mention a few more things off the top of my head:
having multiple ways to think about something
creating new experimental devices/setups
methodological progress (e.g. inventing instrumental variable methods in econometrics)
mathematical progress (e.g. coming up with the notion of a derivative)
having a sense of which things are useful/interesting to understand
generally, a human scientific community doing science has a bunch of interesting structure; in particular, the human minds participating in it have a bunch of interesting structure; one in fact needs a bunch of interesting structure to do science well; in fact, more structure of various kinds is gained when making scientific progress; basically none of this is anywhere to be seen in solomonoff induction
for example, that usually, a scientific theory could be used for making at least some fairly concrete predictions
To be clear: I don’t intend this as a full description of the character of a scientific theory — e.g., I haven’t discussed how it gets related to something practical/concrete like action (or maybe (specifically) prediction). A scientific theory and a theory-in-the-sense-used-in-logic are ultimately also disanalogous in various ways — I’m only claiming it’s a better analogy than that between a scientific theory and a predictive model.
Thanks a lot for this very insightful comment!
Versus: it only predicts.
Scientific epistemology has a distinction between realism and instrumentalism. According to realism, a theory tells you what kind of entities do and do not exist. According to instrumentalism, a theory is restricted to predicting observations. If a theory is empirically adequate, if it makes only correct predictions within its domain, that’s good enough for instrumentalists. But the realist is faced with the problem that multiple theories can make good predictions, yet imply different ontologies, and one ontology can be ultimately correct, so some criterion beyond empirical adequacy is needed.
On the face of it, Solomonoff Inductors contain computer programmes, not explanations, not hypotheses and not descriptions. (I am grouping explanations, hypotheses and beliefs as things which have a semantic interpretation, which say something about reality . In particular, physics has a semantic interpretation in a way that maths does not.)
The Yukdowskian version of Solomonoff switches from talking about programs to talking about hypotheses as if they are obviously equivalent. Is it obvious? There’s a vague and loose sense in which physical theories “are” maths, and computer programs “are” maths, and so on. But there are many difficulties in the details. Not all maths is computable. Neither mathematical equations not computer programmes contain straightforward ontological assertions like “electrons exist”. The question of how to interpret physical equations is difficult and vexed. And a Solomonoff inductor contains programmes, not typical physics equations. whatever problems there are in interpreting maths ontologically are compounded when you have the additional stage of inferring maths from programmes.
In physics, the meanings of the symbols are taught to students, rather than being discovered in the maths. Students are taught that the “f” in f=ma, f is force,”m* is mass and “a” is acceleration. The equation itself , as pure maths, does not determine the meaning. For instance it has the same mathematical form as P=IV, which “means” something different. Physics and maths are not the same subject, and the fact that physics has a real-world semantics is one of the differences.
Similarly, the instructions in a programme have semantics related to programme operations, but not to the outside world. Machine code instructions do things like “Add 1 to register A”. You would have to look at thousands or millions of such low level instructions to infer what kind of kind high level maths—vector spaces , or non Euclidean geometry—the programme is executing.
It’s not how science works, because science doesn’t generate hypotheses mechanically, and doesn’t attempt to brute-force-search them.
Relevance to bounded agents like us, and not being sensitive to an arbitrary choice of language. More on the latter (h/t Jesse Clifton):
I feel like Cunningham’s law got confirmed here. I’m really glad about all the things I learned from people who disagreed with me.
It definitely seems worth knowing about and understanding, but stuff like needing to specify a universal turing machine does still give me pause. It doesn’t make it uninsightful, but I do still think there is more work to do to really understand induction.
Agreed. The primary thing Solomonoff induction doesn’t take into account is computational complexity/ compute. But… you can simply include a reasonable time-penalty and most of the results mostly go through. It becomes a bit more like logical inductors.
Solomonoff induction also dovetails (hah) nicely with the fact that next-token prediction was all you need for intelligence.[1]
well almost, the gap is exactly AIXI
If logical inductors is what one wants, just do that.
I’m not entirely sure, but I suspect that I don’t want any time penalty in my (typical human) prior. E.g. even if quantum mechanics takes non-polynomial time to simulate, I still think it a likely hypothesis. Time penalty just doesn’t seem to be related to what I pay attention to when I access my prior for the laws of physics / fundamental hypotheses. There’s also many other ideas for augmenting a simplicity prior that fail similar tests.
What do you mean by this?