Open Problems Related to Solomonoff Induction

Wei Dai6 Jun 2012 0:26 UTC

55 points

Solomonoff Induction seems clearly “on the right track”, but there are a number of problems with it that I’ve been puzzling over for several years and have not made much progress on. I think I’ve talked about all of them in various comments in the past, but never collected them in one place.

Apparent Unformalizability of “Actual” Induction

Argument via Tarski’s Indefinability of Truth

Informally, the theorem states that arithmetical truth cannot be defined in arithmetic. The theorem applies more generally to any sufficiently strong formal system, showing that truth in the standard model of the system cannot be defined within the system.

Suppose we define a generalized version of Solomonoff Induction based on some second-order logic. The truth predicate for this logic can’t be defined within the logic and therefore a device that can decide the truth value of arbitrary statements in this logical has no finite description within this logic. If an alien claimed to have such a device, this generalized Solomonoff induction would assign the hypothesis that they’re telling the truth zero probability, whereas we would assign it some small but positive probability.

Argument via Berry’s Paradox

Consider an arbitrary probability distribution P, and the smallest integer (or the lexicographically least object) x such that P(x) < 1/3^^^3 (in Knuth’s up-arrow notation). Since x has a short description, a universal distribution shouldn’t assign it such a low probability, but P does, so P can’t be a universal distribution.

Is Solomonoff Induction “good enough”?

Given the above, is Solomonoff Induction nevertheless “good enough” for practical purposes? In other words, would an AI programmed to approximate Solomonoff Induction do as well as any other possible agent we might build, even though it wouldn’t have what we’d consider correct beliefs?

Is complexity objective?

Solomonoff Induction is supposed to be a formalization of Occam’s Razor, and it’s confusing that the formalization has a free parameter in the form of a universal Turing machine that is used to define the notion of complexity. What’s the significance of the fact that we can’t seem to define a parameterless concept of complexity? That complexity is subjective?

Is Solomonoff an ideal or an approximation?

Is it the case that the universal prior (or some suitable generalization of it that somehow overcomes the above “unformalizability problems”) is the “true” prior and that Solomonoff Induction represents idealized reasoning, or does Solomonoff just “work well enough” (in some sense) at approximating any rational agent?

How can we apply Solomonoff when our inputs are not symbol strings?

Solomonoff Induction is defined over symbol strings (for example bit strings) but our perceptions are made of “qualia” instead of symbols. How is Solomonoff Induction supposed to work for us?

What does Solomonoff Induction actually say?

What does Solomonoff Induction actually say about, for example, whether we live in a creatorless universe that runs on physics? Or the Simulation Argument?

What links here?

Wei Dai6 Jun 2012 0:26 UTC

55 points

104 comments2 min readLW link Archive

Solomonoff induction Open Problems

Incorrect 6 Jun 2012 1:41 UTC
12 points
0

Consider an arbitrary probability distribution P, and the smallest integer (or the lexicographically least object) x such that P(x) < 1/3^^^3 (in Knuth’s up-arrow notation). Since x has a short description, a universal distribution shouldn’t assign it such a low probability, but P does, so P can’t be a universal distribution.

Solomonoff Induction would not consider this a description of x as it cannot be used to compute x.
- Wei Dai 6 Jun 2012 1:56 UTC
  5 points
  0
  Parent
  I guess I failed to bridge the inferential gap here. You’re right “Solomonoff Induction would not consider this a description of x as it cannot be used to compute x.” The point I’m trying to make is that a correct formalization of induction/Occam’s Razor ought (in order to satisfy our intuitions) to assign x some probability > 1/3^^^3 since it has a short (even if uncomputable) description, but a formalization based on this P would not, therefore it can’t be a correct formalization. But this is the case no matter what P we use (with different x depending on P), hence the “unformalizability” problem.
  - Adele_L 6 Jun 2012 2:01 UTC
    6 points
    0
    Parent
    Could you explain further? Why “ought” it assign it such a probability? As stated, this seems more convincing as an argument that it “ought not” to assign a probability > 1/3^^^3, despite the short “description”.
    - Wei Dai 6 Jun 2012 2:15 UTC
      5 points
      0
      Parent
      The idea is, however we define P, how can we be that sure that there isn’t some kind of uncomputable physics that would allow someone to build a device that can find the lexicographically least object x such that P(x) < 1/3^^^3 and present it us?
      - Kawoomba 7 Jun 2012 20:22 UTC
        3 points
        0
        Parent
        Maybe we should just work off of the assumption that there’s no relevant uncomputable physics, because if there were, we should probably give up our endeavors anyways, unless we knew how to model an uncomputable reality within a computable AGI-program. As Schmidhuber ever so aptly wrote on his homepage:
        
        The distribution should at least be computable in the limit. That is, there should exist a program that takes as an input any beginning of the universe history as well as a next possible event, and produces an output converging on the conditional probability of the event. If there were no such program we could not even formally specify our universe, leave alone writing reasonable scientific papers about it.
        
        Mitchell_Porter 7 Jun 2012 21:09 UTC
        4 points
        0
        Parent
        
        unless we knew how to model an uncomputable reality within a computable AGI-program
        
        You could start out by trying to understand how an AI might invent the concept of uncomputability, and how it might then proceed to the possibility of uncomputable physics. And one way to get started here is by thinking as a cognitive historian, and asking how humans came up with the concept of uncomputability.
        Vladimir_Nesov 7 Jun 2012 20:27 UTC
        0 points
        0
        Parent
        “Give up” is neither a well-defined strategy, nor one that was shown to be preferable to its alternatives.
      - Manfred 6 Jun 2012 19:31 UTC
        2 points
        0
        Parent
        That gives you a probability inversely proportional to the integral of e^-(the description length) of each number from 0 to infinity. Complexity grows like log(n)-ish. It’s an improper prior.
        
        So I agree with Adele that this may be a modus tollens / modus ponens moment.
      - skepsci 18 Aug 2013 16:56 UTC
        −1 points
        0
        Parent
        If there’s some uncomputable physics that would allow someone to build such a device, we ought to redefine what we mean by computable to include whatever the device outputs. After all, said device falsifies the Church-Turing thesis, which forms the basis for our definition of “computable”.
  - Incorrect 6 Jun 2012 2:00 UTC
    2 points
    0
    Parent
    Are you essentially saying Solomonoff Induction could be inferior to some other meta-algorithm (or whatever you call it) in uncomputable universes?
    - Wei Dai 6 Jun 2012 2:11 UTC
      7 points
      0
      Parent
      Yes. ETA: I’m also making the stronger point that any formalization of induction we can come up with would have a similar problem. (Or at least the kinds of formalizations covered by my arguments.)
  - Armok_GoB 6 Jun 2012 22:43 UTC
    1 point
    0
    Parent
    Technically not if P is sufficiently long and complex.
Eliezer Yudkowsky 27 Dec 2012 3:36 UTC
9 points
0

Suppose we define a generalized version of Solomonoff Induction based on some second-order logic. The truth predicate for this logic can’t be defined within the logic and therefore a device that can decide the truth value of arbitrary statements in this logical has no finite description within this logic. If an alien claimed to have such a device, this generalized Solomonoff induction would assign the hypothesis that they’re telling the truth zero probability, whereas we would assign it some small but positive probability.

It seems to me that the paradox may lie within this problem setup, not within the agent doing the induction.

We first consider that, rather than this device being assigned zero probability, it should actually be inconceivable to the agent—there should not be a finitely describable thingy that the agent assigns zero probability of having a finitely describable property.

Why would an agent using a second-order analogue of Solomonoff induction have such conceptual problems? Well, considering how Tarski’s original undefinability theorems worked, perhaps what goes wrong is this: we want to believe that the device outputs the truth of statements about the universe. But we also want to believe this device is in the universe. So what happens if we ask the device, “Does the universe entail the sentence stating that outputs ‘No’ in response to a question which looks like ?”

Thus, such a device is inconceivable in the first place since it has no consistent model, and we are actually correct to assign zero probability to the alien’s assertion that the device produces correct questions to all questions about the universe including questions about the device itself.
- Wei Dai 27 Dec 2012 4:07 UTC
  1 point
  1
  Parent
  
  we want to believe that the device outputs the truth of statements about the universe
  
  I’m not sure why you say this, because the device is supposed to output the truth of statements about some second-order logic, not about the universe. The device is not describable by the second-order logic via Tarski, so if the device is in the universe, the universe must only be describable by some meta-logic, which implies the device is outputting truth of statements about something strictly simpler than the universe. The agent ought to be able to conceive of this...
  - Eliezer Yudkowsky 27 Dec 2012 4:33 UTC
    0 points
    0
    Parent
    In that case I’m not sure what you mean by “second-order analogue of Solomonoff induction”. Solomonoff induction is about the universe and proceeds from sensory experiences. What does it mean to induct that something may produce the truth of sentences “about a second-order logic”?
    - Wei Dai 27 Dec 2012 5:10 UTC
      0 points
      0
      Parent
      
      What does it mean to induct that something may produce the truth of sentences “about a second-order logic”?
      
      I mean suppose you encounter a device that outputs “true” or “false” whenever you feed it a sentence in some second-order logic, and after doing a lot of tests, you think maybe the device outputs “true” if and only if the input sentence is actually true, regardless of what input it receives. This seems like a straightforward analogue of inducting that something may be a Halting Oracle, which I thought you agreed an agent should be able to do?
      
      I’m not sure if this answers your question. If not, feel free to find me on Skype or Google Chat to talk more.
      - Eliezer Yudkowsky 27 Dec 2012 6:35 UTC
        0 points
        0
        Parent
        By “true” do you mean that the physical universe semantically entails the sentence, or that the sentence is a semantic tautology of second-order logic? I’m assuming the latter, since the former case is what I was assuming when I gave my Tarski reply.
        
        So… I’m pretty sure you can represent a set’s semantic entailment of a Henkin interpretation of a second-order sentence in a first-order set theory. I’m less sure that you can represent entailment of a second-order sentence inside a second-order set theory, but I’m having trouble seeing why you wouldn’t be able to do that. I also think that second-order set theories are supposed to be finitely axiomatizable, though I could be wrong about this. But then I’m not quite sure why we don’t get into Tarskian trouble.
        
        Let ST be the finite axiomatization of a second-order set theory. Then in second-order logic, why doesn’t the sentence (ST->”All sets: Set entails S”) form a truth predicate for S? My guess is that there’s just no theorem which says, “X is true (entailed by the mathematical universal set / model we’re currently operating inside) iff ‘X’ is entailed by all sets (inside the current model)”.
        
        If this is so, then it looks to me like for any given set of axioms corresponding to a second-order set theory, second-order logic can represent the idea of a device that outputs sentences which are semantically entailed by every individual set inside that theory. So the new answer would be that you are welcome to hypothesize that a device prints out truths of second-order logic, for any given background second-order set theory which provides a universe of models against which those sentences are judged universally semantically true.
        
        In which case the indefinite extensibility gets packed into the choice of set theory that we think this device is judging second-order validity for, if I haven’t dropped the bucket somewhere along the way.
        Wei Dai 29 Dec 2012 14:34 UTC
        0 points
        0
        Parent
        
        Then in second-order logic, why doesn’t the sentence (ST->”All sets: Set entails S”) form a truth predicate for S?
        
        Let S=”ST → false”. Then S is false in second-order logic (assuming ST is consistent), but (ST->”All sets: Set entails S”) is true, because ST’s model has to be bigger than any set (I think it’s usually taken to be the proper class of all sets), so every set entails “Not ST”.
        
        So the new answer would be that you are welcome to hypothesize that a device prints out truths of second-order logic, for any given background second-order set theory which provides a universe of models against which those sentences are judged universally semantically true.
        
        On input S=”ST → false”, your device prints out “true”, while my device prints out “false”. I still want to be able to hypothesize my device. :)
        Eliezer Yudkowsky 30 Dec 2012 0:55 UTC
        0 points
        0
        Parent
        There are totally models of ZFC containing sets that are models of ZFC. See “Grothendieck universe”. Is there a reason why it’d be different in second-order logic? I don’t think a second-order set theory would pin down a unique model, why would it? Unless you had some axiom stating that there were no more ordinals past a certain point in which case you might be able to get a unique model. Unless I’m getting this all completely wrong, since I’m overrunning my expertise here.
        
        So in retrospect I have to modify this for us to somehow suppose that the device is operating in a particular model of a second-order theory. And then my device prints out “true” (if it’s in one of the smallest models) or the device prints out “false” (if it’s in a larger model), unless the device is against the background of an ST with an upper bound imposing a unique model, in which case the device does print out “true” for ST → false and from the outside, we think that this device is about a small collection of sets so this result is not surprising.
        
        Then the question is whether it makes sense to imagine that the device is about the “largest relevant” model of a set theory—i.e., for any other similar devices, you think no other device will ever refer to a larger model than the current one, nor will any set theory successfully force a model larger than the current one—I think that’s the point at which things get semantically interesting again.
        Wei Dai 30 Dec 2012 1:57 UTC
        0 points
        0
        Parent
        
        Is there a reason why it’d be different in second-order logic?
        
        Second-order set theory is beyond my expertise too, but I’m going by this paper, which on page 8 says:
        
        We have managed to give a formal semantics for the second-order language of set theory without expanding our ontology to include classes that are not sets. The obvious alternative is to invoke the existence of proper classes. One can then tinker with the definition of a standard model so as to allow for a model with the (proper) class of all sets as its domain and the class of all ordered-pairs x, y (for x an element of y) as its interpretation function.12 The existence of such a model is in fact all it takes to render the truth of a sentence of the language of set theory an immediate consequence of its validity.
        
        So I was taking the “obvious alternative” of proper class of all sets to be the standard model for second order set theory. I don’t quite understand the paper’s own proposed model, but I don’t think it’s a set either.
        Eliezer Yudkowsky 30 Dec 2012 2:42 UTC
        0 points
        0
        Parent
        I’m not sure I believe in proper classes and in particular, I’m not sure there’s a proper class of all sets that could be the model of a second-order theory such that you could not describe any set larger than the model, and as for pinning down that model using axioms I’m pretty sure you shouldn’t be able to do that. There are analogues of the Lowenheim-Skolem theorem for sufficiently large infinities in second-order logic, I seem to recall reading.
[deleted] 6 Jun 2012 17:56 UTC
7 points
0

Is complexity objective?

This one bother me as well. Turing machine basis for complexity favours certain types of theories over others. For example, how many bits of turing machine does it take to predict, say, maxwells equations, which involve continuous fields?

(warning: original “research” following)

One temptation is to sweep this under the rug by saying that the complexity required to talk about continuous math is just a constant-sized interpreter that you put on the turing machine, but this can’t get around the fact that some thoeries won’t need that.

(BTW, is continuous math even exactly turing-computable?)

If we try to generalize and say “turing machines suck, we’ll use some other language”, it becomes clear that all langauges will have some bias. English has a bias for agenty mind-stuff, turing machines have a bias for discrete cellular automata, some imaginable basis language will have a bias for simple continuous mathematics.

I think the basis that is closest to correct would be one based on a hypothetical real-number continuous-memory hypercomputer (for obvious reasons). This of course implies that I have learned this thing, which implies some deeper form of induction over forms of induction, that would make a language that had only one operator that was just “physics” take appropriate penalization. And then thinking about that idea reveals that inducting over langauges and inducting over theories in languages is suspiciously similar.

That tempts me to say that they should not be seperate, but then I still think the general utility of continuous math should not have to be proven for every theory that uses it, which implies some sharing of complexity somehow. But that seems like nonsense, SI already handles that, I think (by each thoery being a completely defactored package deal). Now I think it would be good to find a new SI that can handle factoring (for algorithmic reasons), but maybe that is a step down the “how do we actually do induction” road, which is not what SI is about.

Thinking over this all again, it seems starting with any form of induction that sortof works will eventually come to the right answer. The hypothetical perfect form of induction that would be the best a learning machine could start with is just the actual answer. Again there’s that duality (is that the right word) between theories and induction priors.

This looks to me alot like some kind of iterative improvement algorithm, where you have to start somewhere, and where you start is chosen from the same domain as your answer (think newton’s method, or gradient descent). It seems like even starting with some human language (like english), you would eventually learn (as we have done) that building a math-interpreter is the thing to do.

Ok, that is the end of my confusion dump.

My unproven, badly specified idea that has something to do with induction:

Ok, background: All turing-complete languages can build interpreters for all others. This implies some traversable languagespace (which is a subset of programspace) where the distance between two points is the length of the interpreter to get from one language to the other. We will also want to be able to go backwards, so that we can go from any point in programspace to any other (by backing up to a good language, and then going forward to new program). Going backwards means building this point from some point in languagespace, rather than building some point in programspace from this point in languagespace) Points in programspace are defined over behavior, so the same program will be reachable from multiple languages.

(the lisp-like area gets a lot of traffic thru it. (“any sufficiently complex program contains … half of common lisp”))

Ok, so now we have a programspace with distances measured in bits. We could just put a probability distribution over the subset of programspace that makes predictions. Solomonof induction does this, with initial improbability defined by distance from turing-machine-langauge.

I think there might be some way to use this programspace idea for induction other than just being another way to look at SI programs. I’m imagining probability or something flowing around in this space updating our concept of simplicity so that when we see that newtonian mechanics, GR, and quantum mechanics are all closely reachable from some mathy point in languagespace, we then upgrade the probability of math-reachable thoeries, without even talking about what they predict. I’m also imagining if we build an AI to do induction, we say “here are some interesting points in programspace (newton, maxwell, SR, GR, quantum) that have served us well” and letting it figure out a distribution over programspace instead of “start from turing-machine-language”.

I have more ideas and I fear I am not communicating this very well.

I think this interpreter-traversable programspace is a good concept to investigate for trying to make SI more objective. Even if we just stick with something like SI, it seems like an interesting concept.

yeah...
- endoself 8 Jun 2012 4:21 UTC
  2 points
  0
  Parent
  The thing that you’re describing is a lot like finding the stationary distribution of a Markov chain. Not all Markov chains have stationary distributions and I can’t tell whether this one would, though if it does have one it would be unique. It’s an interesting idea.
  
  I should also note that it is not necessary to do anything like this to preform induction over multiple theories. Instead, we can just require a program that outputs known theories in addition to the new theory. We can ask them to be output in any form, such as predictions or source code, since converting source code to predictions is O(1) in program size. Then, if the different theories have more in common, the program computing them all will be shorter than one that must seperately specify elements of very different theories. A slightly different implementation of the same general principle is Schmidhuber’s Optimal Ordered Problem Solver.
- jsalvatier 6 Jun 2012 20:37 UTC
  2 points
  0
  Parent
  Some continuous math is commutable, some is not. There are uncomputable real numbers. Computable number.
[deleted] 6 Jun 2012 2:57 UTC
5 points
0
This is the second mention of second-order logical Solomonoff Induction, but I can’t imagine what such a thing would look like.
- Wei Dai 6 Jun 2012 17:37 UTC
  5 points
  0
  Parent
  It’s Luke and Eliezer’s term, but I guess the idea is similar to the one I had. You take some second-order theory, and let each string in the formal language that constitutes a description of X contribute n^-l to the probability of X, where n is the size of the alphabet, and l is the length of the string. Use the resulting distribution in place of the universal prior in Solomonoff Induction.
  - cousin_it 19 Aug 2012 18:14 UTC
    0 points
    0
    Parent
    Let’s say the theory is ZFC. Does the string “0 if the continuum hypothesis if true, otherwise 1” contribute to the probability of 0, or the probability of 1?
    - Wei Dai 21 Aug 2012 0:55 UTC
      0 points
      0
      Parent
      (As you most likely know, in the standard interpretation of second order ZFC the continuum hypothesis has a definite truth value, but it can’t be proved or disproved in any of the deductive systems that have been proposed for second order ZFC.) My suggestion is that the string “0 if the continuum hypothesis if true, otherwise 1” contribute to the probability of 0 if the continuum hypothesis is true, and the probability of 1 if the continuum hypothesis is false. The problem that we don’t know how to deal with a probability distribution like this seems similar to the problem of not knowing how to use the universal prior in the original Solomonoff Induction, both of which seem to be instances of logical/mathematical uncertainty, and there might be a general solution to that problem.
      
      The reason I think this is plausible is that for example P=NP may not be provable or disprovable in any formal deductive system that have been proposed, and yet any agent would still have to make decisions that are affected by its truth value, such as whether to search for a polynomial solution to 3-SAT. If there is some sort of principled way to make those decisions, perhaps the same methods can be used to make decisions involving the continuum hypothesis?
Patrick 16 Jun 2012 17:02 UTC
4 points
0
All universal Turing machines can simulate each other with logarithmic slowdown. Saying that the parameter means that complexity “subjective” is like saying the time complexity of Quicksort is “subjective” because the algorithm doesn’t specify which programming language to implement it in.
jacobt 6 Jun 2012 21:22 UTC
3 points
0
For aliens with a halting oracle:

Suppose the aliens have this machine that may or may not be a halting oracle. We give them a few Turing machine programs and they decide which ones halt and which ones don’t. Then we run the programs. Sure enough, none of the ones they say run forever halt, and some of them they say don’t run forever will halt at some point. Suppose we repeat this process a few times with different programs.

Now what method should we use to predict the point at which new programs halt? The best strategy seems to be to ask the aliens which ones halt, give the non-halting ones a 0 probability of halting at every step, and give the other ones some small nonzero probability of halting on every step. Of course this strategy only works if the aliens actually have a halting oracle so it its predictions should be linearly combined with a fallback strategy.

I think that Solomonoff induction will find this strategy, because the hypothesis that the aliens have a true halting oracle is formalizable. Here’s how: we learn a function from aliens’ answers → distribution over when the programs halt. We can use the strategy of predicting that the ones that the aliens say don’t halt don’t halt, and using some fallback mechanism for predicting the ones that do halt. This strategy is computable so Solomonoff induction will find it.

For Berry’s paradox:

This is a problem with every formalizable probability distribution. You can always define a sequence that says: see what bit the predictor predicts with a higher probability, then output the opposite. Luckily Solomonoff induction does the best it can by having the estimates converge to 50%. I don’t see what a useful solution to this problem would even look like; it seems best to just treat the uncomputable sequence as subjectively random.
TimFreeman 23 Jun 2013 18:02 UTC
2 points
0

Consider an arbitrary probability distribution P, and the smallest integer (or the lexicographically least object) x such that P(x) < 1/3^^^3 (in Knuth’s up-arrow notation). Since x has a short description, a universal distribution shouldn’t assign it such a low probability, but P does, so P can’t be a universal distribution.

The description of x has to include the description of P, and that has to be computable if a universal distribution is going to assign positive probability to x.

If P has a short computable description, then yes, you can conclude that P is not a universal distribution. Universal distributions are not computable.

If the shortest computable description of P is long, then you can’t conclude from this argument that P is not a universal distribution, but I suspect that it still can’t be a universal distribution, since P is computable.

If there is no computable description of P, then we don’t know that there is a computable description of x, so you have no contradiction to start with.
Eliezer Yudkowsky 17 Aug 2012 21:59 UTC
2 points
0

Suppose we define a generalized version of Solomonoff Induction based on some second-order logic. The truth predicate for this logic can’t be defined within the logic and therefore a device that can decide the truth value of arbitrary statements in this logical has no finite description within this logic. If an alien claimed to have such a device, this generalized Solomonoff induction would assign the hypothesis that they’re telling the truth zero probability, whereas we would assign it some small but positive probability

Actually, what Tarski seems to show is that for any language for describing any set of universes, there just is no language representable inside those universes for representing arbitrary statements, with truth values, about “everything” including the language and the statements in it. If you try to invent such a language, it will end up inconsistent—not at the point where it tries to correctly assign truth, but at the point where it can represent truth, due to analogues of “This statement is false.” It isn’t needful to assign 0 or 1, in particular, to this statement; the moment you represent it, you can prove an inconsistency. But is this really proper to blame on Solomonoff induction?
- Wei Dai 18 Aug 2012 2:15 UTC
  0 points
  0
  Parent
  
  But is this really proper to blame on Solomonoff induction?
  
  I would like to have a method of induction that for any formal language, assigns a non-zero prior to the existence of a device that can enumerate or decide all true sentences in that language, or alternatively an explanation based on reasonable principles for which such devices should have zero probabilities. Right now we do not have either, and your research program for improving SI (i.e., to base it on second-order logic) will not give us either even if it’s successful. So while I’m not sure it makes sense to say I “blame” Solomonoff induction (what could that mean?), you could say that I’m not satisfied with either the status quo or any improvements to it that we can currently foresee.
  - Eliezer Yudkowsky 18 Aug 2012 7:51 UTC
    2 points
    0
    Parent
    Give me a set of formal languages over which you can say the phrase “for any formal language”, and the truth predicate for the union of the set won’t be in any language in the set. I’m still trying to understand how to deal with this inside AI, but I’m not sure that blaming it on second-order logical induction is putting the blame in the right place.
    - Wei Dai 18 Aug 2012 8:31 UTC
      0 points
      0
      Parent
      Again, I’m not sure what you mean by “blame” here. If you’re saying that Tarski’s result represents a problem that affects more than just attempts to generalize Solomonoff induction, then I agree.
      
      BTW, while I have your attention, what’s your evaluation of Paul Christiano’s FAI design idea, which sort of tries to punt as many philosophical problems as possible (including this one)? I noticed that you didn’t comment in that discussion.
3p1cd3m0n 4 Jan 2015 20:12 UTC
1 point
0
If I understand Solomonoff Induction correctly, for all n and p the the sum of the probabilities of all the hypotheses of length n equals the sum of the probabilities of hypotheses of length p. If this is the case, what normalization constant could you possibility use to make all the probabilities sum to one? It seems there are none.
- pengvado 7 Jan 2015 12:58 UTC
  3 points
  0
  Parent
  Use a prefix-free encoding for the hypotheses. There’s not 2^n hypotheses of length n: Some of the length-n bitstrings are incomplete and you’d need to add more bits in order to get a hypothesis; others are actually a length <n hypothesis plus some gibberish on the end.
  
  Then the sum of the probabilities of all programs of all lengths combined is 1.0. After excluding the programs that don’t halt, the normalization constant is Chaitin’s Omega.
  - 3p1cd3m0n 10 Jan 2015 20:25 UTC
    2 points
    0
    Parent
    Unfortunately Chaitin’s Omega’s incomputable, but even if it wasn’t I don’t see how it would work as a normalizing constant. Chaitin’s Omega is a real number, there is an infinite number of hypotheses, and (IIRC) there is no real number r such that r multiplied by infinite equals one, so I don’t see how Chaitin’s Omega could possible work as a normalizing constant.
    - Kindly 11 Jan 2015 6:12 UTC
      0 points
      0
      Parent
      If we assign a value of 2^-n to each hypothesis (that is, each halting program) of length n, then the total value summed over all hypotheses is Chaitin’s Omega.
      
      Thus, if we assign a value of 2^-n/Omega to each hypothesis, the total is 1, and this gives us a valid probability distribution.
      - 3p1cd3m0n 11 Jan 2015 20:53 UTC
        0 points
        0
        Parent
        That would assign a zero probability of hypotheses that take more than n bits to specify, would it not? That sounds far from optimal.
        Kindly 12 Jan 2015 2:02 UTC
        0 points
        0
        Parent
        Sorry, I mean that we do this for each value of n. So, given a hypothesis H, it is assigned a probability of 2^-length(H)/Omega.
        3p1cd3m0n 14 Jan 2015 1:50 UTC
        0 points
        0
        Parent
        I still don’t see how that would make all the hypotheses sum to 1. Wouldn’t that only make the probabilities of all the hypotheses of length n sum to 1, and thus make the sum of all hypotheses sum to > 1? For example, consider all the hypotheses of length 1. Assuming Omega = 1 for simplicity, there are 2 hypotheses, each with a probability of 2^-1/1 = 0.5, making all them sum to 1. There are 4 hypotheses of length 2, each with a probability of 2^-2/1 = 0.25, making them sum to 1. Thus, the sum of the probabilities of all hypotheses of length ⇐ 2 = 2 > 1.
        
        Is Omega doing something I don’t understand? Would all hypotheses be required to be some set length?
        gjm 14 Jan 2015 2:40 UTC
        2 points
        0
        Parent
        I think you may be missing two things.
        
        First: the hypotheses/programs (recall: we are representing hypotheses by computer programs in some suitable language) need to be written in a “prefix-free” representation, meaning one in which e.g. if 100111 is a valid program then nothing else beginning 100111 can be. So you don’t get two programs of length 1, four of length 2, etc. If you do this, and if you also arrange your coding so that every infinite string of bits starts with some legal program, then the sum over all legal programs of 2^-length is exactly 1. (So Pr(program) = 2^-length(program) defines a probability distribution over programs. It’s the same probability distribution as you get from the following procedure: keep generating random bits until the bits you’ve generated form a legal program.)
        
        Second: you can’t just take Ω=1 “for simplicity”; Ω is defined to be what you get by adding up 2^-length for all programs that halt. (Remember that hypotheses are being represented as computer programs here.) Equivalently, if you imagine picking a random well-formed program by choosing random bits, Ω is the probability that the resulting program halts. So it’s certainly <1.
        
        [EDITED to fix an ugly but hopefully not confusing typo.]
        What links here?
        Kindly's comment on Open Problems Related to Solomonoff Induction by Wei Dai (16 Jan 2015 3:58 UTC; 0 points)
        Kindly 14 Jan 2015 2:52 UTC
        1 point
        0
        Parent
        The hypotheses are encoded in a prefix-free way: no hypothesis is a prefix of another hypothesis.
        
        In particular, this eliminates your example: if both strings of length 1 (“0” and “1“) are valid hypotheses, then no string of longer length can be a valid hypothesis (then we can take Omega=1 and assign a probability of ¹⁄₂ to each). Or we could have “0”, “10”, and “110” be valid hypotheses; then we can take Omega = ⁷⁄₈ and assign probabilities of ⁴⁄₇, ²⁄₇ and ¹⁄₇ to these.
        
        In general, the prefix-free condition ensures that the sum over all hypotheses converges, by Kraft’s inequality, which is the real heavy lifter here; the constant is merely there to make the sum over all hypotheses converge to 1 instead.
        
        You might imagine that hypotheses are required to be grammatically correct English sentences (I guess encoded in ASCII or something). In that case, no hypothesis is a prefix of another, because each hypothesis ends with a period.
        3p1cd3m0n 15 Jan 2015 1:35 UTC
        0 points
        0
        Parent
        Right; I forgot that it used a prefix-free encoding. Apologies if the answer to this is painfully obvious, but does having a prefix-free encoding entail that there is a finite number of possible hypotheses?
        Kindly 15 Jan 2015 2:21 UTC
        0 points
        0
        Parent
        It doesn’t: for example, the set of strings 0, 10, 110, 1110, 11110, 111110, … is infinite and prefix-free. So is the set of C-style (null-terminated) strings.
        3p1cd3m0n 16 Jan 2015 3:50 UTC
        0 points
        0
        Parent
        I see. Does the method of normalization you gave work even when there is an infinite number of hypotheses?
        Expand this thread
        Kindly 16 Jan 2015 3:58 UTC
        0 points
        0
        Parent
        It does. The infinite sum will always converge, so by normalizing we can make it converge to 1.
        
        Take gjm’s approach to explaining the normalization, in which the initial weight of 2^-length assigned to a hypothesis is the probability that you obtain that hypothesis by choosing bits at random. Then the normalization is just a conditional probability: you divide by the probability that, when choosing bits at random, you do eventually end up hitting a hypothesis.
        gjm 15 Jan 2015 13:29 UTC
        0 points
        0
        Parent
        Remark: these are special cases of the schema “all strings that contain exactly one instance of pattern X in possible positions Y”, where in the first case X is “0“ and Y is “anywhere” and in the second X is “00000000” and Y is “any position that’s a multiple of 8 bits”.
        
        (Of course there are plenty of other prefix-free sets of bitstrings. For instance: interpret blocks of 8 bits as characters as in Kindly’s second example, and say that a valid program is anything that begins with a “(”, has properly matched parentheses, and ends with the ”)” matching the first “(”. This doesn’t have the convenient property that every bitstring begins with a valid program; for examples that do, take the exponential Golomb coding or the Elias omega coding of natural numbers.
[ ]
[deleted]
- Eliezer Yudkowsky 17 Aug 2012 21:52 UTC
  1 point
  0
  Parent
  A kilobit of improbability requires only a kilobit of data to offset it, which isn’t very crackpot at all. Proving minimum length is impossible, but proving an upper bound on length is very easy, and that proves a lower bound on probability.
[ ]
[deleted]
- Patrick 17 Jun 2012 11:54 UTC
  1 point
  0
  Parent
  I take your point re: length vs speed. The theorem that I think justifies calling Kolmogorov Complexity objective is this:
  
  “If K1 and K2 are the complexity functions relative to description languages L1 and L2, then there is a constant c (which depends only on the languages L1 and L2) such that |K1(s) - K2(s)| ⇐ c for all strings s.”
  
  (To see why this is true, note that you can write a compiler for L2 in L1 and vice versa)
  
  I don’t see why code modelling symmetric laws should be longer than code modelling asymmetric laws (I’d expect the reverse; more symmetries means more ways to compress.) Nor why 3 spatial dimensions (or 10 if you ask string theorists) is the minimum number of spatial dimensions compatible with intelligent life.
  
  The whole point of Solomonoff induction is that the priors of a theory are not arbitrary. They are determined by the complexity of the theory, then you use Bayes rule on all theories to do induction.
G0W51 18 Feb 2015 3:21 UTC
0 points
0
Here’s an idea for a modification of Solomonoff induction that no longer has a subjectively-chosen machine to encode hypotheses in. One could instead simply consider how many bits it would take to encode a solution on all universal Turing machines, and making each hypotheses’ prior equal to the average of its prior on each machine. This makes somewhat intuitive sense, as one doesn’t know which “machine” the universe in “programmed” on, so it’s best to just assume a random machine. Unless I’m mistaken, there’s an infinite number of universal Turing machines, but I think algorithms could still approximate the induction.
- Cole Wyeth 1 Jun 2023 12:41 UTC
  1 point
  0
  Parent
  How would you take the average over an infinite number of UTM’s? You would first need to choose a distribution on them.
3p1cd3m0n 4 Jan 2015 20:22 UTC
0 points
0
Is there any justification that Solomonoff Induction is accurate, other than intuition?
- Username 10 Jan 2015 4:39 UTC
  0 points
  0
  Parent
  of course not
  - 3p1cd3m0n 10 Jan 2015 22:15 UTC
    0 points
    0
    Parent
    “Of course” implies that the answer is obvious. Why is it obvious?
    - kokotajlod 23 Dec 2016 18:36 UTC
      0 points
      0
      Parent
      Dunno what Username was thinking, but here’s the answer I had in mind: “Why is it obvious? Because the Problem of Induction has not yet been solved.”
TimFreeman 23 Jun 2013 18:34 UTC
0 points
0
Suppose we define a generalized version of Solomonoff Induction based on some second-order logic. The truth predicate for this logic can’t be defined within the logic and therefore a device that can decide the truth value of arbitrary statements in this logical has no finite description within this logic. If an alien claimed to have such a device, this generalized Solomonoff induction would assign the hypothesis that they’re telling the truth zero probability, whereas we would assign it some small but positive probability.

I’m not sure I understand you correctly, but there are two immediate problems with this:
- If the goal is to figure out how useful Solomonoff induction is, then “a generalized version of Solomonoff Induction based on some second-order logic” is not relevant. We don’t need random generalizations of Solomonoff induction to work in order to decide whether Solomonoff induction works. I think this is repairable, see below.
- Whether the alien has a device that does such-and-such is not a property of the world, so Solomonoff induction does not assign a probability to it. At any given time, all you have observed is the behavior of the device for some finite past, and perhaps what the inside of the device looks like, if you get to see. Any finite amount of past observations will be assigned positive probability by the universal prior so there is never a moment when you encounter a contradiction.
If I understand your issue right, you can explore the same issue using stock Solomonoff induction: What happens if an alien shows up with a device that produces some uncomputable result? The prior probability of the present situation will become progressively smaller as you make more observations and asymptotically approach zero. If we assume quantum mechanics really is nondeterministic, that will be the normal case anyway, so nothing special is happening here.
roll 8 Jun 2012 9:12 UTC
0 points
0

What does Solomonoff Induction actually say?

I believe this one has been closed ages ago by Alan Turing, and practically demonstrated for approximations by the investigation into busy beaver function for example. We won’t be able to know BB(10) from God almighty. Ever.
- koning_robot 30 Aug 2013 17:56 UTC
  0 points
  0
  Parent
  I’m not sure what you’re trying to say here, but if you consider this a relative weakness of Solomonoff Induction, then I think you’re looking at it the wrong way. We will know it as well as we possibly could given the evidence available. Humans are subject to the constraints that Solomonoff Induction is subject to, and more.
  - koning_robot 30 Aug 2013 18:07 UTC
    −2 points
    0
    Parent
    Whoops, thread necromancy.
Steven_Bukal 8 Jun 2012 5:01 UTC
0 points
0

Solomonoff Induction is supposed to be a formalization of Occam’s Razor, and it’s confusing that the formalization has a free parameter in the form of a universal Turing machine that is used to define the notion of complexity.

I’m very confused about this myself, having just read this introductory paper on the subject.

My understanding is that an “ideal” reference UTM would be a universal turing machine with no bias towards any arbitrary string, but rigorously defining such a machine is an open problem. Based on our observation of UTMs, the more arbitrary simplifications a Turing Machine makes, the longer its compiler will have to be on other UTMs. This is called the Short Compiler Assumption. Since we can’t agree on a single ideal reference UTM, we instead approximate it by limiting ourselves to a class of “natural” UTMs which are mutually interpretable within a constant. The smaller the constant, the less arbitrary simplification the UTMs in the class will tend to make.

This seems to mesh with the sequences post on Occam’s Razor:

What if you don’t like Turing machines? Then there’s only a constant complexity penalty to design your own Universal Turing Machine that interprets whatever code you give it in whatever programming language you like.

What I’m confused about is this constant penalty. Is it just “some constant” or is it knowable and small? Is there a specific UTM for which we can always write a short compiler on any other UTM?

I’m getting out of my league here, but I’d guess that there is an upper bound on how complex you can make certain instructions across all UTMs because UTMs must (a) be finite, and (b) at the lowest level implement a minimal set of instructions, including a functionally full set of logical connectives. So for example, say I take as my “nonbiased” UTM a UTM that aside from the elementary operations of the machine on its tape, jump instructions, etc. has only a minimal number of instructions implementing a minimally complete operator set with less than two connectives: {NAND} or {NOR}. My understanding is that anything that’s a Universal Turing Machine is going to have to itself have a small number of instructions that implement the basic machine instructions and a complete set of connectives somewhere in its instruction set, and converting between {NAND} or {NOR} and any other complete set of connectives can be done with a trivially short encoding.
- pengvado 9 Jun 2012 11:40 UTC
  1 point
  0
  Parent
  
  Since we can’t agree on a single ideal reference UTM, we instead approximate it by limiting ourselves to a class of “natural” UTMs which are mutually interpretable within a constant.
  
  Even if you do that, you’re left with an infinite number of cliques such that within any given clique the languages can write short interpreters for each other. Picking one of the cliques is just as arbitrary as picking a single language to begin with. i.e. for any given class of what-we-intuitively-think-of-as-complicated programs X, you can design a language that can concisely represent members of X and can concisely represent interpreters with this special case.
  
  What I’m confused about is this constant penalty. Is it just “some constant” or is it knowable and small?
  
  It’s a function of the language you’re writing an interpreter of and the language you’re writing it in. “Constant” in that it doesn’t depend on the programs you’re going to run in the new language. i.e. for any given pair of languages there’s a finite amount of disagreement between those two versions of the Solomonoff prior; but for any given number there are pairs of languages that disagree by more than that.
  
  Is there a specific UTM for which we can always write a short compiler on any other UTM?
  
  No. There’s no law against having a gigabyte-long opcode for NAND, and using all the shorter opcodes for things-we-intuitively-think-of-as-complicated.
  - Steven_Bukal 13 Jun 2012 5:00 UTC
    0 points
    0
    Parent
    So is there then a pragmatic assumption that can be made? Maybe we assume that if I pick a turing machine blindly, without specifically designing it for a particular output string, it’s unlikely to be strongly biased towards that string.
    - pengvado 14 Jun 2012 9:18 UTC
      3 points
      0
      Parent
      What probability distribution over turing machines do you blindly pick it from? That’s another instance of the same problem.
      
      Pragmatically, if I non-blindly pick some representation of turing machines that looks simple to me (e.g. the one Turing used), I don’t really doubt that it’s within a few thousand bits of the “right” version of solomonoff, whatever that means.
      - Steven_Bukal 17 Jun 2012 19:32 UTC
        2 points
        0
        Parent
        
        Pragmatically, if I non-blindly pick some representation of turing machines that looks simple to me (e.g. the one Turing used), I don’t really doubt that it’s within a few thousand bits of the “right” version of solomonoff, whatever that means.
        
        Why not?
timtyler 7 Jun 2012 0:06 UTC
0 points
0

Apparent Unformalizability of “Actual” Induction

Re: Tarski’s Indefinability of Truth—I don’t get this one. Solomonoff Induction estimates probabilities of bitstring sequences—but it’s uncomputable—and the best we can do is finite approximations—which of course have limitations and will be able to be improved upon—just as Tarski’s “Indefinability of Truth” says.

Re: Argument via Berry’s Paradox—but the description of x includes P—which could be complex. So the description of x is not short—if you are not given P.
timtyler 6 Jun 2012 23:59 UTC
0 points
0

Is complexity objective?

Kinda. Some notions of complexity are more useful than others. However, condition your machine on some real world sense data for a while, and any reasonable prior soon gets swamped by data—so in practice, this is not a big deal.
- Username 10 Jan 2015 4:40 UTC
  0 points
  0
  Parent
  If it was a matter of gathering enough data we wouldn’t have to talk about Solomonoff induction at all.
timtyler 6 Jun 2012 23:57 UTC
0 points
0

How can we apply Solomonoff when our inputs are not symbol strings?

Inputs can always be represented by bit sequences. Qualia are an irrelevance. Q.E.D.
- torekp 10 Jun 2012 0:02 UTC
  0 points
  0
  Parent
  Doesn’t the “irrelevance” depend on an assumption about how our minds work? If we have some analog computation going on, the bit sequences might be a mere approximation to our actual inductive reasoning. Of course, once we put our conclusions into language, those conclusions can be digitized—but that might happen at a relatively late stage of the reasoning game.
  - timtyler 10 Jun 2012 1:28 UTC
    0 points
    0
    Parent
    Analog computation is another irrelevance—analog systems can always be approximated arbitrarily closely by discrete systems. Witness the ongoing digial revolution.
    - torekp 10 Jun 2012 2:09 UTC
      0 points
      0
      Parent
      Sure, if you are designing a system from the ground up. I’ve never needed more than 12 bits of precision for any real engineering purpose, and I wouldn’t pay a single dollar for more. But Wei_Dai’s question was about us—or at least reads literally that way. Maybe I took it too literally.
      - timtyler 10 Jun 2012 11:45 UTC
        0 points
        0
        Parent
        There’s no credible evidence that we aren’t digital. E.g. see digital physics. If nobody can tell the difference, the issue is probably in the realm of vacuous philosophy..
witzvo 6 Jun 2012 7:03 UTC
0 points
0
I’m not sure what you have in mind by basing Solomonoff induction on a second-order logic. Can you give a reference?

Here is what I’d suggest as a baseline. Use John Tromp’s Binary Lambda Calculus with some variation of Levin Search to explore time/complexity bounded programs that output some initial sequence of data.
Incorrect 6 Jun 2012 1:56 UTC
0 points
0

Solomonoff Induction is supposed to be a formalization of Occam’s Razor, and it’s confusing that the formalization has a free parameter in the form of a universal Turing machine that is used to define the notion of complexity. What’s the significance of the fact that we can’t seem to define a parameterless concept of complexity? That complexity is subjective?

Perhaps we could formalize some argument that for an arbitrarily poor definition of complexity, the error becomes negligible for the average sufficiently complex universe?
Incorrect 6 Jun 2012 1:44 UTC
0 points
0

Solomonoff Induction is defined over symbol strings (for example bit strings) but our perceptions are made of “qualia” instead of symbols. How is Solomonoff Induction supposed to work for us?

Our perceptions have some neurological representation.
Incorrect 6 Jun 2012 1:52 UTC
−1 points
0

What does Solomonoff Induction actually say about, for example, whether we live in a creatorless universe that runs on physics?

The entirely depends on your definition of creator. Traditional creators such as the Christian god could potentially have enough explanatory power once properly defined, yet would end up horrendously complex (encoding an entire human-like mind into the primitive natural law) unless you further reduce the construction of the creator to a natural process.

Or the Simulation Argument?

Solomonoff Induction is empirical: unless the laws of our universe are broken by the simulators, Solomonoff Induction says nothing about whether we are in a simulation. If the simulators have not influenced our universe but will in the future, it depends entirely on whether the simulator universe is simpler than ours. If the simulators have already influenced our universe it’s more complicated.
- Will_Newsome 6 Jun 2012 12:31 UTC
  0 points
  0
  Parent
  People are perfectly fine with fuzzy approximate explanations of phenomena, like Maxwell’s equations &c. “Goddidit” is not that different. Trying to get a full causal explanation would mean finding bits of Omega. In the end, decision theory is fundamental, and epistemological abstractions like SI are cool but ultimately irrelevant. This whole “encoding a human-like mind” thing doesn’t work like you think it does—you can interpret SI that way and see some cool implications, just remember it’s a useless toy model. …Just sayin’.
  - Incorrect 6 Jun 2012 16:13 UTC
    1 point
    0
    Parent
    
    “Goddidit” is not that different.
    
    Physics theories import low-complexity mathematical models. “Goddidit” imports complicated human notions of agency. Approximate explanations are fine if we can reason that their implicit complexity is low relative to their explanatory power (a relatively easily satisfied metric, after which competition between theories becomes the key factor).
    
    In Solomonoff Induction, theories that don’t explain data must contain that data raw.
    - Will_Newsome 6 Jun 2012 16:32 UTC
      3 points
      0
      Parent
      
      Physics theories import low-complexity mathematical models. “Goddidit” imports complicated human notions of agency.
      
      Frankly, I think this idea is attractive but ultimately an error. It is indeed true that to an analytical mind with an interest in physics, mathematics feels a lot less complex, in some sense, than intuitive notions of agency. But no matter how much physics or psychology you know, you don’t have introspective access to the universal prior—maybe the prior privileges math over psychology, or maybe it doesn’t. All we have is our evidence, often in the form of conclusions drawn from intuitive analyses of what hypotheses have or haven’t tended to bear intellectual or instrumental fruit in the past—id est, we’re awfully close to talkin’ ’bout pragmatics and decision theory here. And yes, mathematical explanations have been surprisingly effective. But if you look at human history, hypotheses that make use of “complicated human notions of agency” have also been pretty damn effective. It’s not obvious what notion of complexity would massively privilege the former over the latter, and at any rate, we have no way of knowing, because you can’t find the universal prior in your backyard.
      - Incorrect 6 Jun 2012 16:56 UTC
        1 point
        0
        Parent
        
        It is indeed true that to an analytical mind with an interest in physics, mathematics feels a lot less complex,
        
        We have objective verification of the low complexity of formalized mathematical theories because we can look at the length of their formal description in say, first-order logic.
        
        But no matter how much physics or psychology you know, you don’t have introspective access to the universal prior—maybe the prior privileges math over psychology, or maybe it doesn’t.
        
        Are you really suggesting some model of computation based on human ideas might work better than say, lambda calculus for computing Kolmogorov complexity for Solomonoff Induction? I’m not sure how to argue with that but I would appreciate it if you would state it explicitly.
        Will_Newsome 6 Jun 2012 17:14 UTC
        0 points
        0
        Parent
        
        We have objective verification of the low complexity of formalized mathematical theories because we can look at the length of their formal description in say, first-order logic.
        
        Right, and that’ll be important if we ever run into aliens that for some reason can’t wrap their brains around English, but instead can figure out our category theory notation and so on. Or if we’re trying to build an FAI, or collaborate with the aforementioned aliens to build FAI.
        
        I’m not sure how to argue with that but I would appreciate it if you would state it explicitly.
        
        Apologies, inferential distance, and there’s a few meta-level points that I think are important to communicate. But there’s inferential distance on the meta level too.
    - Will_Newsome 6 Jun 2012 16:33 UTC
      0 points
      0
      Parent
      Also keep in mind that algorithmic information/probability theory is actually quite hard to interpret correctly—the basic, intuitive way to read meaning into the math is not quite the way to do it. cousin_it has a post or two correcting some intuitive errors of interpretation.
      What links here?
      Incorrect's comment on Rationality Quotes September 2012 by Jayson_Virissimo (9 Sep 2012 4:18 UTC; 5 points)
      - Incorrect 6 Jun 2012 16:59 UTC
        1 point
        0
        Parent
        I found these:
        
        Intuitive Explanation of Solomonoff Induction—lukeprog
        
        Does Solomonoff always win? - cousin_it
        
        K-complexity of everyday things—cousin_it
        
        Solomonoff Induction, by Shane Legg—cousin_it
        
        I would appreciate it if people could link me to more.
        Will_Newsome 6 Jun 2012 17:06 UTC
        2 points
        0
        Parent
        Alas, none of those are the relevant ones I think. I’m actually rather busy visiting home, so I can only justify certain comments to myself, but I hope someone provides the links.
        
        For what it’s worth, I’m a little skeptical of lukeprog’s understanding of SI—no offense to him meant, it’s just I so happen to believe he made a rather big error when interpreting the math. On the other hand, cousin_it seems to be really on the ball here. Those are just my impressions; I’m a pretend philosopher, not a compsci dude. At any rate I think it’d be just dandy for cousin_it to check Luke’s posts and share his impression or critiques.
        Will_Newsome 6 Jun 2012 17:23 UTC
        −1 points
        0
        Parent
        Here’s one I was thinking of:
        
        The prior of a hypothesis does not depend on its complexity—cousin_it
        
        (If I recall, Nesov’s comment clearly demonstrates the important point.)
        timtyler 7 Jun 2012 0:40 UTC
        0 points
        0
        Parent
        That post seems to mix together the concept of a prior with the concept of experience.
        Will_Newsome 6 Jun 2012 18:35 UTC
        −2 points
        0
        Parent
        http://lesswrong.com/lw/328/description_complexity_an_apology_and_note_on/
        Will_Newsome 6 Jun 2012 18:38 UTC
        2 points
        0
        Parent
        
        we can adopt the general rule that mentioning K-complexity in a discussion of physics is always a sign of confusion :-)
        
        Mentioning it anywhere except algorithmic information theory is a sign of confusion. This includes theology and parapsychology. Use just Bayes or, if you want to be all fancy, updateless-like decision theories. I love algorithmic probability to death but it’s just not something you should use casually. Too many pitfalls.
        Eugine_Nier 7 Jun 2012 2:04 UTC
        −2 points
        0
        Parent
        
        Use just Bayes
        
        Bayes requires a prior.
        Will_Newsome 7 Jun 2012 3:57 UTC
        4 points
        0
        Parent
        No one should ever need to discuss “priors”. Focus on the likelihood ratio.
        timtyler 12 Jun 2012 1:10 UTC
        0 points
        0
        Parent
        ...but that’s like comparing apples and cheese!
  - Viliam_Bur 6 Jun 2012 14:30 UTC
    0 points
    0
    Parent
    
    People are perfectly fine with fuzzy approximate explanations of phenomena, like Maxwell’s equations &c.
    
    Approximate explanations have some predictive power.
    
    “Goddidit” is not that different.
    
    What experience do you expect if “Goddidit”, as opposed to if “Goddidntdoit”?
    
    (Skeletons of angels versus skeletons of dinosaurs? People with supernatural powers versus people working with superstition? Benevolent universe versus indifferent universe?)
    - Will_Newsome 6 Jun 2012 16:22 UTC
      0 points
      0
      Parent
      
      If in your heart you believe you already know, or if in your heart you do not wish to know, then your questioning will be purposeless and your skills without direction.
      
      —Twelve Virtues of Rationality
      
      It’s just, I’m having an amazing time back home, and my time is limited. I don’t know your goals, but you might want to try harder to signal that you’re really curious and not just asking questions that you think are rhetorical. When you reference common knowledge ’round these parts, like Eliezer’s posts, you should expect that the other person is already aware of that knowledge, and that they have real, substantive reasons to think that what they said is not entirely refuted by the contents of said common knowledge.
      
      Of course, asking rhetorical questions is a perfectly decent way to make an argument. It’s just that arguments in that sense aren’t quite what’s called for in situations like these, I think. But that might just be a difference in our epistemic styles, especially if you’re Slavic. (Gasp, racism! ;P )
      - Viliam_Bur 6 Jun 2012 16:55 UTC
        0 points
        0
        Parent
        
        When you reference common knowledge ’round these parts, like Eliezer’s posts, you should expect that the other person is already aware of that knowledge
        
        Good point.
        
        Also good point about time being limited, so...
        
        If you’d someday later feel like writing a LW article about similarities between “Goddidit” and Maxwell’s equations, or something like that, I will read it.
private_messaging 16 Jun 2012 17:56 UTC
−2 points
0
The interesting bit is that if you could actually use Solomonoff induction as prior, I’m pretty sure you’d be an irreparable aether believer and ‘relativity sceptic’, with confidence that has so many nines nothing would ever convince you. That’s because the absolute-speed irreversible aether universes like 3d versions of Conway’s game of life are so much more computationally compact than highly symmetric universe with many space-time symmetries (to the point of time and space intermixing in the equations) and the fully reversible fundamental laws of physics. Saying this to you as applied mathematician with experience actually implementing laws of physics in software.

The priors are only a free parameter if all you care for is not getting Dutch-booked on your bets about physics but do not give a damn to have any useful theories.
- Wei Dai 17 Jun 2012 10:31 UTC
  3 points
  0
  Parent
  
  That’s because the absolute-speed irreversible aether universes like 3d versions of Conway’s game of life are so much more computationally compact than highly symmetric universe with many space-time symmetries (to the point of time and space intermixing in the equations) and the fully reversible fundamental laws of physics.
  
  Can you explain why this is the case? Also, when you say “aether” universes are more computationally compact than “relativity” universes, is this before or after taking into account our observations (i.e., are you restricting your attention to universes that fit our observations, or not)?
  
  Saying this to you as applied mathematician with experience actually implementing laws of physics in software.
  
  Is it possible that what you said is true only if we want the laws of physics to run fast on current computers? I’m afraid that most software people, including myself, probably have bad intuitions about Solomonoff Induction because we only have experience with a very small subset of possible computations, namely those that can be run economically on modern computers. Perhaps the laws of physics can be implemented much more compactly if we ignored efficiency?
  - private_messaging 17 Jun 2012 12:56 UTC
    −2 points
    0
    Parent
    
    Can you explain why this is the case?
    
    As intuition pump consider the reversible vs irreversible cellular automata. If you pick at random, vast majority will not be reversible. Ditto for the symmetries. (Keep in mind that in Solomonoff probability we feed infinite random tape to the machine. It is no Occam’s razor. Elegant simplest deterministic things can be vastly outnumbered by inelegant, even if they are most probable. edit: that is to say you can be more likely to pick something asymmetric even if any particular asymmetric is less likely than symmetric)
    
    Also, when you say “aether” universes are more computationally compact than “relativity” universes, is this before or after taking into account our observations (i.e., are you restricting your attention to universes that fit our observations, or not)?
    
    There can always be a vast conspiracy explaining the observations… ideally if you could simulate whole universe (or multiverse) from big bang to today and pick out the data matching observations or the conspired lying, then maybe it’d work, but the whole exercise of doing physics is that you are embedded within universe you are studying. edit: and that trick won’t work if the code eats a lot of random tape.
    
    Is it possible that what you said is true only if we want the laws of physics to run fast on current computers?
    
    I don’t think relaxing the fast requirement really helps that much. Consider programming Conway’s game of life in Turing machine. Or vice versa. Or the interpreter for general TM on the minimal TM. It gets way worse if you want full rotational symmetry on discrete system.
    
    Of course, maybe one of the small busy beavers is a superintelligence that likes to play with various rules like that. Then I’d be wrong. Can not rule even this possibility out. Kolmogorov/Solomonoff name drop is awesome spice for cooking proven-untestable propositions.
    
    One could argue that second order logic could work better, but this is getting way deep into land of untestable propositions that are even proven untestable, and the appropriate response would be high expectations asian father picture with “why not third order logic?”.
    
    edit: also you hit nail on the head on an issue here: i can not be sure that there is no very short way to encode something. You can ask me if I am sure that busy beaver 6 is not anything, and I am not sure! I am not sure it is not the god almighty. The proposition that there is a simple way is a statement of faith that can not be disproved any more than existence of god. Also, I feel that there has to be scaling for the computational efficiency in the prior. The more efficient structures can run more minds inside. Or conversely, the less efficient structures take more coding to locate minds inside of them.
- Mitchell_Porter 17 Jun 2012 0:21 UTC
  3 points
  0
  Parent
  Discrete universes introduce an unobserved parameter (a fundamental length) as well as a preferred frame. Abandoning complex numbers for a countable group in QM would also be very difficult. There is a complexity theory for real-valued objects and if Solomonoff induction could be adapted to that context, continuum theories ought to be favored.
  - private_messaging 17 Jun 2012 5:30 UTC
    1 point
    0
    Parent
    Yep. Well, in any case, the point is that the Solomonoff probability is not in any way ‘universal prior’, you have to pick the machine, then you can’t actually compute anything useful because its uncomputable and you’ll never get anything non-trivial all the way down to minimum length.
    
    If you go for changing the machine, you could use laws of physics as you know them as the machine, too (then you’ll be irreparable sceptic of any new physics). In the context of laws of physics its just the emulation of one universal computation (laws of physics) with another (what ever stuff you pick). We are only discussing this because some highly self-important people heard of Solomonoff induction and Kolmogorov complexity, modelled those with something ill defined and fuzzy (lacking not only sufficient understanding of those things but the understanding of the concepts in which to understand those, and so on several layers deep, as is usually the case when you choose something really complicated without having spent a lot of time studying towards it) and then used it as universal smart sounding word to say (e.g. here or here or here ). Name-dropping and nothing more.
- Plasmon 16 Jun 2012 19:05 UTC
  0 points
  0
  Parent
  Consider Penrose’s “Angular momentum: An approach to combinatorial space-time” (math.ucr.edu/home/baez/penrose/Penrose-AngularMomentum.pdf)
  
  Hence, we have a way of getting hold of the concept of Euclidean angle, starting from a purely combinatorial scheme
  
  ...
  
  The central idea is that the system defines the geometry. If you like, you can use the conventional description to fit the thing into the ‘ordinary space-time’ to begin with, but then the geometry you get out is not necessarily the one you put into it
  
  It seems that euclidean space, at least, can be derived as a limiting case from simple combinatorial principles. It is not at all clear that general relativity does not have kolmogorov complexity comparable to the cellular automata of your “aether universes”.
  - private_messaging 16 Jun 2012 19:55 UTC
    0 points
    0
    Parent
    Kolmogorov complexity of GR itself (text of GR or something) is irrelevant. Kolmogorov complexity of universe that has the symmetries of GR and rest of physics, is. Combinatorial principles are nice but it boils down to representing state of the universe with cells on tape of linear turing machine.
[deleted] 24 Aug 2012 8:05 UTC
−3 points
0

Solomonoff Induction is supposed to be a formalization of Occam’s Razor, and it’s confusing that the formalization has a free parameter in the form of a universal Turing machine that is used to define the notion of complexity. What’s the significance of the fact that we can’t seem to define a parameterless concept of complexity? That complexity is subjective?

In the Kolmogorov-Chaitin Minimum description length approach, the subject must pick a Turing machine whose operations describe the basic operations believed to represent ‘simplicity’ by the subject.

Kant postulated that all knowledge relies on synthetic, a priori laws of nature, like causality and substance. The problem, then, is how this is possible. Kant’s solution was to reason that the subject must supply laws that make experience of objects possible, and that these laws are the synthetic, a priori laws of nature that we know apply to all objects before we experience them.

Kants apriori categories could be equivalent to the ‘basic operations’ of the aforementioned Turing machine. In the language of computer science, this would be equivalent to specifying an upper ontology defining all the fundamental ontological primatives in the domain ‘reality’.