That these functions need to be “taken over” by someone else is exactly what signals existence of a problem. They are taken over in the same way in computer science programming language design was taken over by smart hackers like van Rossum, operating system design was taken over by smart hackers like Torvalds, search engine design was taken over by smart hackers like Brin & Page, etc. - you got the point.
gedymin
This post reminded me of this interview with Jeremy Howard, multiple-times winner of Kaggle data prediction contest. The article is titled “Specialist Knowledge Is Useless and Unhelpful” and includes this:
Q. How have experts reacted?
A. The messages are uncomfortable for a lot of people. It’s controversial because we’re telling them: “Your decades of specialist knowledge are not only useless, they’re actually unhelpful; your sophisticated techniques are worse than generic methods.”
It’s only anecdotal evidence, but still; I think that becoming a domain expert always comes at the cost of certain inertia in thinking. “If all you have is a hammer, every problem looks like nail”. An abacus expert may have spent years honing his techniques, but that would not help him much in a contest against someone equipped with a computer. The expert would be better off if had switched to the computer himself, which was psychologically hard, because he had so much more (subjectively) to lose, compared with unskilled novices.
“Bad” steelmanning: a form of misunderstanding your opponent (as in the Roman example).
“Good” steelmanning: marshalling the best form of the argument against your position and defeating it. Also known as charitable interpretation.
I don’t think steelmanning is particularly dangerous. It should be quite easy to recognize and avoid “bad” steelmanning, which is the whole source of the danger. If the Roman is truly a rationalist, he should be aware of his very limited knowledge of the modern society and the dangers of substituting an argument in his situation. Steelmanning in his situation is a clear example of irrational behavior.
Do you mean “hypothesis” as something that solves the problem?
If yes, then there’s a problem. Either your beliefs are inconsistent (as they don’t add up to 100%), or the hypotheses cannot be independent. Assuming your beliefs are consistent, your best best would be to figure out what’s the correlation between these hypothesis is between choosing one to test. For example, C could be correlated with A. Choosing to pursue A (or C) would then give information about C (or A) as well.
If no, the amount of the information you’ll get from testing them is incomparable. For example, B could be about a relatively minor thing, while A about 99% of the solution.
I’d select B for the “hard coding problems”, as that would give me the most information. (I’m already relatively sure that C won’t work, but I may have absolutely no idea whether B would work).
Understanding and justifying Solomonoff induction
You’re right—Solomonoff induction is justified for any model, whenever it is computable. The exact technical details are unimportant. I was a bit confused about this point.
Essentially, Solomonoff induction “works” in the physical universe (i.e. is the best predictor) whenever:
1) there is a source of randomness
2) there are some rules
3) the universe is not hypercomputing.
If there is no source of randomness involved, the process is fully deterministic, and can be best predicted by deductive reasoning.
If there are no rules, the process is fully random. In this case just tossing a fair coin will predict equally well (with P=0.5).
It it’s hypercomputing, a “higher-order” Solomonoff induction will do better.
According to the SchoIarpedia article, it’s not required that the probabilities add up to 1:
∑x m(x) < 1
It’s simpler to defined the Universal Prior in the way that not-halting programs are “not counted”. So the sum is not over all program, just the halting ones.
How do you define an extended version of a program?
I agree to what you’re saying only as long an “extended version” of a program is the same program, just padded with meaningless bits at the end.
I don’t agree that its true regarding longer programs in general. A “longer version” of a program is any program that produces the same output, including programs that use different algorithms, programs that are padded at the beginning, padded in the middle etc.
To get the mathematical expression of the Universal prior, we count all programs, and here the extended versions comes into play. Intuitively, a program padded with a single bit is counted twice (as it may be padded with 0 or 1), a program padded with 2 bits is counted 4 times, etc. So a program with length L+k is 2^k times less likely than as program with length L. That’s where the 2^{-length(p)} probability comes from.
- 15 Jan 2014 20:57 UTC; 0 points) 's comment on Understanding and justifying Solomonoff induction by (
you could make a version of SI that gave some long programs more weight than some short programs
+1. Hadn’t thought of that.
It’s enough to say that you are computable and therefore SI will beat you in the long run
Unless I have access to the right kind of uncomputable oracle. Which seems to be not totally impossible if we’re living in an uncomputable universe.
“Higher-order” SI is just SI armed with an upgraded universal prior—one that is defined with reference to a universal hypercomputer instead of a universal Turing machine.
What do you mean, precisely, by “SI is in the same position”?
If I understand correctly, SI can be “upgraded” by changing the underlying prior.
So if we have strong reasons suspect that we, humans, have access to the halting oracle, then we should try to build AI reasoning that approximates SI + a prior defined over the universal Turing machine enhanced with halting oracle.
If we don’t (as is the case at the moment), then we just build AI that approximates SI + the Universal prior.
Either way, there are no guessing games when we, on average, are able to beat the AI, as long as the approximation is good enough.
See this comment for a justification of why shorter programs are more likely.
The argument is taken from this paper.
You kind of missed my point. I provided it as a technical justification why the exact formula 2^-length(p) makes sense. As you said, “It’s one possible way to get where you want”. I know that 2^-length(p) is not the only possible option, but it’s the most natural one given that we know nothing about the input.
The model of computation is a “Universal Turing machine provided with completely random noise” (i.e. fair coin flips on the input tape). It’s a mathematical formalism; it has nothing to do with modern compilers.
I thought it was implied that SI cannot access any boxes or push any buttons, because it’s a mathematical abstraction. But I see that you mean “an AI agent with SI”.
A good point!
However, if reformulate the hypothesis as “the cause of patients symptoms is cancer”, it can be treated by SI. Reductionism says that “the patient has cancer” can be translated to a statement about physical laws and elementary particles. There are problems with such a reduction, but they are of practical nature. In order to apply SI, everything must be reduced to the basic terms. So human-identified, macro-scale patterns like “cancer” must be reduced to biochemical patterns, which in turn must be reduced to molecular dynamics, which in turn must be reduced to quantum mechanics. Doing these reductions is not possible in practice due to computational limitations—even if we did know all the laws for reduction. But in theory they are fine.
A problem of more theoretical nature: whatever evidence we get in the real-world is probabilistic. SI supposes that our observations are always 100% correct.
On the other hand, it’s intuitively obvious that if we treat some high-level concepts as irreductible entitities, then a form of Solomonoff induction can be applied directly. E.g. it can be used to a priori prefer “the cause of the symptoms A, B and C is cancer” over “”there are three unrelated causes a, b, and c to the three symptoms A, B, and C”.
It seems to me, however, that SI is not very useful if there are other reliable methods of determining the probabilities. For example, “the single cause of the two patient’s symptoms is bubonic plague” in the modern world is a hypothesis of low probability even if it is the shortest one, as the empirically-determined a priori probability of having bubonic plague is tiny.
An additional problem with Solomonoff induction
Yes.
You’re right that it probably doesn’t exist.
You’re wrong that no one cares about it. Humans have a long history of caring about things that do not exist. I’m afraid that hypercomputation is one of those concepts that open the floodgates to bad philosophy.
My primary point is the confusion about randomness, though. Randomness doesn’t seem to be magical, unlike other forms of uncomputability.
The humanities. Literary theory, culture and media studies, as well as philosophy (continental philosophy in particular) are fields filled with nonsense. The main problem with these fields stems from the lack or difficulty of an objective judgment, in my opinion. In literary theory, for example, it’s more important to be interesting than to be right.
I have to admit that they fail the heuristic of ideological interest as well. Even if we ignore for a moment Nobel and other prizes in literature (which have always been seriously biased), as well as culture studies in totalitarian states (where they were completely ideologized), we see that the most influential “schools” of literary theory in Western academia are ideologically charged: Feminism, Marxism, Postcolonialism etc.
It’s a shame, especially because there are more than a few low-hanging fruits in literature studies. Whatever you think about it, literary criticism has plenty of possible objectives that are both interesting and useful:
to help the reader to select texts that merit his attention; the critic should serve the public as a filter, as an independent evaluator of texts, as someone with the hidden knowledge how to distinguish “good” books from “mediocre” ones;
to help the reader to gain better understanding of existing texts;
to help the writer to write better: deeper, clearer, using a richer set of literary devices and methods.
There are lists like 1000 books to read before you die for the first objective (which are mostly useless, but that’s not the point); there are books like Unlocking Harry Potter: Five Keys for the Serious Reader for the second objective; there are books like How to Write a Damn Good Novel for the third one; but none of these objectives apparently are interesting enough for literature departments in elite universities.