One-Magisterium Bayes
[Epistemic Status: Very partisan / opinionated. Kinda long, kinda rambling.]
In my conversations with members of the rationalist community as well as in my readings of various articles and blog posts produced by them (as well as outside), I’ve noticed a recent trend towards skepticism of Bayesian principles and philosophy (see Nostalgebraist’s recent post for an example), which I have regarded with both surprise and a little bit of dismay, because I think progress within a community tends to be indicated by moving forward to new subjects and problems rather than a return to old ones that have already been extensively argued for and discussed. So the intent of this post is to summarize a few of the claims I’ve seen being put forward and try to point out where I believe these have gone wrong.
It’s also somewhat an odd direction for discussion to be going in, because the academic statistics community has largely moved on from debates between Bayesian and Frequentist theory, and has largely come to accept both the Bayesian and the Frequentist / Fisherian viewpoints as valid. When E.T. Jaynes wrote his famous book, the debate was mostly still raging on, and many questions had yet to be answered. In the 21st century, statisticians have mostly come to accept a world in which both approaches exist and have their merits.
Because I will be defending the Bayesian side here, there is a risk that this post will come off as being dogmatic. We are a community devoted to free-thought after all, and any argument towards a form of orthodoxy might be perceived as an attempt to stifle dissenting viewpoints. That is not my intent here, and in fact I plan on arguing against Bayesian dogmatism as well. My goal is to argue that having a base framework with which to feel relatively high confidence in is useful to the goals of the community, and that if we feel high enough confidence in it, then spending extra effort trying to prove it false might be wasting brainpower than can potentially be used on more interesting or useful tasks. There could always be a point we reach where most of us strongly feel that unless we abandon Bayesianism, we can’t make any further progress. I highly doubt that we have reached such a point or that we ever will.
This is also a personal exercise to test my understanding of Bayesian theory and my ability to communicate it. My hope is that if my ideas here are well presented, it should be much easier for both myself and others to find flaws with it and allow me to update.
I will start with an outline of philosophical Bayesianism, also called “Strong Bayesianism”, or what I prefer to call it, “One Magisterium Bayes.” The reason for wanting to refer to it as being a single magisterium will hopefully become clear. The Sequences did argue for this point of view, however, I think the strength of the Sequences had more to do with why you should update your beliefs in the face of new evidence, rather than why Bayes’ theorem was the correct way to do this. In contrast, I think the argument for using Bayesian principles as the correct set of reasoning principles was made more strongly by E.T. Jaynes. Unfortunately, I feel like his exposition of the subject tends to get ignored relative to the material presented in the Sequences. Not that the information in the Sequences isn’t highly relevant and important, just that Jaynes’ arguments are much more technical, and their strength can be overlooked for this reason.
The way to start an exposition on one-magisterium rationality is by contrast to multi-magisteria modes of thought. I would go as far as to argue that the multi-magisterium view, or what I sometimes prefer to call tool-boxism, is by far the most dominant way of thinking today. Tool-boxism can be summarized by “There is no one correct way to arrive at the truth. Every model we have today about how to arrive at the correct answer is just that – a model. And there are many, many models. The only way to get better at finding the correct answer is through experience and wisdom, with a lot of insight and luck, just as one would master a trade such as woodworking. There’s nothing that can replace or supersede the magic of human creativity. [Sometimes it will be added:] Also, don’t forget that the models you have about the world are heavily, if not completely, determined by your culture and upbringing, and there’s no reason to favor your culture over anyone else’s.”
As I hope to argue in this post, tool-boxism has many downsides that should push us further towards accepting the one-magisterium view. It also very dramatically differs in how it suggests we should approach the problem of intelligence and cognition, with many corollaries in both rationalism and artificial intelligence. Some of these corollaries are the following:
If there is no unified theory of intelligence, we are led towards the view that recursive self-improvement is not possible, since an increase in one type of intelligence does not necessarily lead to an improvement in a different type of intelligence.
With a diversification in different notions of correct reasoning within different domains, it heavily limits what can be done to reach agreement on different topics. In the end we are often forced to agree to disagree, which while preserving social cohesion in different contexts, can be quite unsatisfying from a philosophical standpoint.
Related to the previous corollary, it may lead to beliefs that are sacred, untouchable, or based on intuition, feeling, or difficult to articulate concepts. This produces a complex web of topics that have to be avoided or tread carefully around, or a heavy emphasis on difficult to articulate reasons for preferring one view over the other.
Developing AI around a tool-box / multi-magisteria approach, where systems are made up of a wide array of various components, limits generalizability and leads to brittleness.
One very specific trend I’ve noticed lately in articles that aim to discredit the AGI intelligence explosion hypothesis, is that they tend to take the tool-box approach when discussing intelligence, and use that to argue that recursive self-improvement is likely impossible. So rationalists should be highly interested in this kind of reasoning. One of Eliezer’s primary motivations for writing the Sequences was to make the case for a unified approach to reasoning, because it lends credence to the view of intelligence in which intelligence can be replicated by machines, and where intelligence is potentially unbounded. And also that this was a subtle and tough enough subject that it required hundreds of blog posts to argue for it. So because of the subtle nature of the arguments I’m not particularly surprised by this drift, but I am concerned about it. I would prefer if we didn’t drift.
I’m trying not to sound No-True-Scotsman-y here, but I wonder what it is that could make one a rationalist if they take the tool-box perspective. After all, even if you have a multi-magisterium world-view, there still always is an underlying guiding principle directing the use of the proper tools. Often times, this guiding principle is based on intuition, which is a remarkably hard thing to pin down and describe well. I personally interpret the word ‘rationalism’ as meaning in the weakest and most general sense that there is an explanation for everything – so intelligence isn’t irreducibly based on hand-wavy concepts such as ingenuity and creativity. Rationalists believe that those things have explanations, and once we have those explanations, then there is no further use for tool-boxism.
I’ll repeat the distinction between tool-boxism and one-magisterium Bayes, because I believe it’s that important: Tool-boxism implies that there is no underlying theory that describes the mechanisms of intelligence. And this assumption basically implies that intelligence is either composed of irreducible components (where one component does not necessarily help you understand a different component) or some kind of essential property that cannot be replicated by algorithms or computation.
Why is tool-boxism the dominant paradigm then? Probably because it is the most pragmatically useful position to take in most circumstances when we don’t actually possess an underlying theory. But the fact that we sometimes don’t have an underlying theory or that the theory we do have isn’t developed to the point where it is empirically beating the tool box approach is sometimes taken as evidence that there isn’t a unifying theory. This is, in my opinion, the incorrect conclusion to draw from these observations.
Nevertheless, it seems like a startlingly common conclusion to draw. I think the great mystery is why this is so. I don’t have very convincing answers to that question, but I suspect it has something to do with how heavily our priors are biased against a unified theory of reasoning. It may also be due to the subtlety and complexity of the arguments for a unified theory. For that reason, I highly recommend reviewing those arguments (and few people other than E.T. Jaynes and Yudkowsky have made them). So with that said, let’s review a few of those arguments, starting with one of the myths surrounding Bayes theorem I’d like to debunk:
Bayes Theorem is a trivial consequence of the Kolmogorov Axioms, and is therefore not powerful.
This claim usually presented as part of a claim that “Bayesian” probability is just a small part of regular probability theory, and therefore does not give us any more useful information than you’d get from just studying probability theory. And as a consequence of that, if you insist that you’re a “Strong” Bayesian, that means you’re insisting on using only on that small subset of probability theory and associated tools we call Bayesian.
And the part of the statement that says the theorem is a trivial consequence of the Kolmogorov axioms is technically true. It’s the implication typically drawn from this that is false. The reason it’s false has to do with Bayes theorem being a non-trivial consequence of a simpler set of axioms / desiderata. This consequence is usually formalized by Cox’s theorem, which is usually glossed over or not quite appreciated for how far-reaching it actually is.
Recall that the qualitative desiderata for a set of reasoning rules were:
Degrees of plausibility are represented by real numbers.
Qualitative correspondence with common sense.
Consistency.
You can read the first two chapters of Jaynes’ book, Probability Theory: The Logic of Science if you want more detail into what those desiderata mean. But the important thing to note from them is that they are merely desiderata, not axioms. This means we are not assuming those things are already true, we just want to devise a system that satisfies those properties. The beauty of Cox’s theorem is that it specifies exactly one set of rules that satisfy these properties, of which Bayes Theorem as well as the Kolmogorov Axioms are a consequence of those rules.
The other nice thing about this is that degrees of plausibility can be assigned to any proposition, or any statement that you could possibly assign a truth value to. It does not limit plausibility to “events” that take place in some kind of space of possible events like whether a coin flip comes up heads or tails. What’s typically considered the alternative to Bayesian reasoning is Classical probability, sometimes called Frequentist probability, which only deals with events drawn from a sample space, and is not able to provide methods for probabilistic inference of a set of hypotheses.
For axioms, Cox’s theorem merely requires you to accept Boolean algebra and Calculus to be true, and then you can derive probability theory as extended logic from that. So this should be mindblowing, right? One Magisterium Bayes? QED? Well apparently this set of arguments is not convincing to everyone, and it’s not because people find Boolean logic and calculus hard to accept.
Rather, there are two major and several somewhat minor difficulties encountered within the Bayesian paradigm. The two major ones are as follows:
The problem of hypothesis generation.
The problem of assigning priors.
The list of minor problems are as follows, although like any list of minor issues, this is definitely not exhaustive:
Should you treat “evidence” for a hypothesis, or “data”, as having probability 1?
Bayesian methods are often computationally intractable.
How to update when you discover a “new” hypothesis.
Divergence in posterior beliefs for different individuals upon the acquisition of new data.
Most Bayesians typically never deny the existence of the first two problems. What some anti-Bayesians conclude from them, though, is that Bayesianism must be fatally flawed due to those problems, and that there is some other way of reasoning that would avoid or provide solutions to those problems. I’m skeptical about this, and the reason I’m skeptical is because if you really had a method for say, hypothesis generation, this would actually imply logical omniscience, and would basically allow us to create full AGI, RIGHT NOW. If you really had the ability to produce a finite list containing the correct hypothesis for any problem, the existence of the other hypotheses in this list is practically a moot point – you have some way of generating the CORRECT hypothesis in a finite, computable algorithm. And that would make you a God.
As far as I know, being able to do this would imply that P = NP is true, and as far as I know, most computer scientists do not think it’s likely to be true (And even if it were true, we might not get a constructive proof from it). But I would ask: Is this really a strike against Bayesianism? Is the inability of Bayesian theory to provide a method for providing the correct hypothesis evidence that we can’t use it to analyze and update our own beliefs?
I would add that there are plenty of ways to generate hypotheses by other methods. For example, you can try to make the hypothesis space gargantuan, and encode different hypotheses in a vector of parameters, and then use different optimization or search procedures like evolutionary algorithms or gradient descent to find the most likely set of parameters. Not all of these methods are considered “Bayesian” in the sense that you are summarizing a posterior distribution over the parameters (although stochastic gradient descent might be). It seems like a full theory of intelligence might include methods for generating possible hypotheses. I think this is probably true, but I don’t know of any arguments that it would contradict Bayesian theory.
The reason assigning prior probabilities is such a huge concern is that it forces Bayesians to hold “subjective” probabilities, where in most cases, if you’re not an expert in the domain of interest, you don’t really have a good argument for why you should hold one prior over another. Frequentists often contrast this with their methods which do not require priors, and thus hold some measure of objectivity.
E.T. Jaynes never considered to this be a flaw in Bayesian probability, per se. Rather, he considered hypothesis generation, as well as assigning priors, to be outside the scope of “plausible inference” which is what he considered to be the domain of Bayesian probability. He himself argued for using the principle of maximum entropy for creating a prior distribution, and there are also more modern techniques such as Empirical Bayes.
In general, Frequentists often have the advantage that their methods are often simpler and easier to compute, while also having strong guarantees about the results, as long as certain constraints are satisfied. Bayesians have the advantage that their methods are “ideal” in the sense that you’ll get the same answer each time you run an analysis. And this is the most common form of the examples that Bayesians use when they profess the superiority of their approach. They typically show how Frequentist methods can give both “significant” and “non-significant” labels to their results depending on how you perform the analysis, whereas the Bayesian way just gives you the probability of the hypothesis, plain and simple.
I think that in general, once could say that Frequentist methods are a lot more “tool-boxy” and Bayesian methods are more “generally applicable” (if computational tractability wasn’t an issue). That gets me to the second myth I’d like to debunk:
Being a “Strong Bayesian” means avoiding all techniques not labeled with the stamp of approval from the Bayes Council.
Does this mean that Frequentist methods, because they are tool box approaches, are wrong or somehow bad to use, as some argue that Strong Bayesians claim? Not at all. There’s no reason not to use a specific tool, if it seems like the best way to get what you want, as long as you understand exactly what the results you’re getting mean. Sometimes I just want a prediction, and I don’t care how I get it – I know that a specific algorithm being labeled “Bayesian” doesn’t confer it any magical properties. Any Bayesian may want to know the frequentist properties of their model. It’s easy to forget that different communities of researchers flying the flag of their tribe developed some methods and then labeled them according to their tribal affiliation. That’s ok. The point is, if you really want to have a Strong Bayesian view, then you also have to assign probabilities to various properties of each tool in the toolbox.
Chances are, if you’re a statistics/data science practitioner with a few years of experience applying different techniques to different problems and different data sets, and you have some general intuitions about which techniques apply better to which domains, you’re probably doing this in a Bayesian way. That means, you hold some prior beliefs about whether Bayesian Logistic Regression or Random Forests is more likely to get what you want on this particular problem, you try one, and possibly update your beliefs once you get a result, according to what your models predicted.
Being a Bayesian often requires you to work with “black boxes”, or tools that you know give you a specific result, but you don’t have a full explanation of how it arrives at the result or how it fits in to the grand scheme of things. A Bayesian fundamentalist may refuse to work with any statistical tool like that, not realizing that in their everyday lives they often use tools, objects, or devices that aren’t fully transparent to them. But you can, and in fact do, have models about how those tools can be used and the results you’d get if you used them. The way you handle these models, even if they are held in intuition, probably looks pretty Bayesian upon deeper inspection.
I would suggest that instead of using the term “Fully Bayesian” we use the phrase “Infinitely Bayesian” to refer to using a Bayesian method for literally everything, because it more accurately shows that it would be impossible to actually model every single atom of knowledge probabilistically. It also makes it easier to see that even the Strongest Bayesian you know probably isn’t advocating this.
Let me return to the “minor problems” I mentioned earlier, because they are pretty interesting. Some epistemologists have a problem with Bayesian updating because it requires you to assume that the “evidence” you receive at any given point is completely true with probability 1. I don’t really understand why it requires this. I’m easily able to handle the case where I’m uncertain about my data. Take the situation where my friend is rolling a six-sided die, and I want to know the probability of it coming up 6. I assume all sides are equally likely, so my prior probability for 6 is 1⁄6. Let’s say that he rolls it where I can’t see it, and then tells me the die came up even. What do I update p(6) to?
Let’s say that I take my data as saying “the die came up even.” Then p(6 | even) = p(even | 6) * p(6) / p(even) = 1 * (1/6) / (1 / 2) = 1⁄3. Ok, so I should update p(6) to 1⁄3 now right? Well, that’s only if I take the evidence of “the die came up even” as being completely true with probability one. But what actually happened is that my friend TOLD ME the die came up even. He could have been lying, maybe he forgot what “even” meant, maybe his glasses were really smudged, or maybe aliens took over his brain at that exact moment and made him say that. So let’s say I give a 90% chance to him telling the truth, or equivalently, a 90% chance that my data is true. What do I update p(6) to now?
It’s pretty simple. I just expand p(6) over “even” as p(6) = p(6 | even) p(even) + p(6 | odd) p(odd). Before he said anything, p(even) = p(odd) and this formula evaluated to (1/3)(1/2) + (0)(1/2) = 1⁄6, my prior. After he told me the die came up even, I update p(even) to 0.9, and this formula becomes (1/3)(9/10) + (0)(1/10) = 9⁄30. A little less than 1⁄3. Makes sense.
In general, I am able to model anything probabilistically in the Bayesian framework, including my data. So I’m not sure where the objection comes from. It’s true that from a modeling perspective, and a computational one, I have to stop somewhere, and just accept for the sake of pragmatism that probabilities very close to 1 should be treated as if they were 1, and not model those. Not doing that, and just going on forever, would mean being Infinitely Bayesian. But I don’t see why this counts as problem for Bayesianism. Again, I’m not trying to be omniscient. I just want a framework for working with any part of reality, not all of reality at once. The former is what I consider “One Magisterium” to mean, not the latter.
The rest of the minor issues are also related to limitations that any finite intelligence is going to have no matter what. They should all, though, get easier as access to data increases, models get better, and computational ability gets better.
Finally, I’d like to return to an issue that I think is most relevant to the ideas I’ve been discussing here. In AI risk, it is commonly argued that a sufficiently intelligent agent will be able to modify itself to become more intelligent. This premise assumes that an agent will have some theory of intelligence that allows it to understand which updates to itself are more likely to be improvements. Because of that, many who argue against “AI Alarmism” will argue against the premise that there is a unified theory of intelligence. In “Superintelligence: The Idea that Eats Smart People”, I think most of the arguments can be reduced to basically saying as much.
From what I can tell, most arguments against AI risk in general will take the form of anecdotes about how really really smart people like Albert Einstein were very bad at certain other tasks, and that this is proof that there is no theory of intelligence that can be used to create a self-improving AI. Well, more accurately, these arguments are worded as “There is no single axis on which to measure intelligence” but what they mean is the former, since even multiple axes of intelligence (such as measure of success on different tasks) would not actually imply that there isn’t one theory of reasoning. What multiple axes of measuring intelligence do imply is that within a given brain, the brain may have devoted more space to better modeling certain tasks than others, and that maybe the brain isn’t quite that elastic, and has a hard time picking up new tasks.
The other direction in which to argue against AI risk is to argue against the proposed theories of reasoning themselves, like Bayesianism. The alternative, it seems, is tool-boxism. I really want to avoid tool-boxism because it makes it difficult to be a rationalist. Even if Bayesianism turns out to be wrong, does this exclude other, possibly undiscovered theories of reasoning? I’ve never seen that touched upon by any of the AI risk deniers. As long as there is a theory of reasoning, then presumably a machine intelligence could come to understand that theory and all of its consequences, and use that to update itself.
I think the simplest summary of my post is this: A Bayesian need not be Bayesian in all things, for reasons of practicality. But a Bayesian can be Bayesian in any given thing, and this is what is meant by “One Magisterium”.
I didn’t get to cover every corollary of tool-boxing or every issue with Bayesian statistics, but this post is already really long, and for the sake of brevity I will probably end it here. Perhaps I can cover those issues more thoroughly in a future post.
- What currents of thought on LessWrong do you want to see distilled? by 8 Jan 2021 21:43 UTC; 48 points) (
- Bayesian probability theory as extended logic—a new result by 6 Jul 2017 19:14 UTC; 37 points) (
- Bi-Weekly Rational Feed by 9 Jul 2017 19:11 UTC; 14 points) (
- Non offensive word for people who are not single-magisterium-Bayes thinkers by 1 Jul 2020 22:33 UTC; 3 points) (
- Non offensive word for people who are not single-magisterium-Bayes thinkers by 1 Jul 2020 22:33 UTC; 3 points) (
- Ontologies Should Be Backwards-Compatible by 14 May 2023 17:21 UTC; 3 points) (
- 3 Jul 2020 7:37 UTC; 1 point) 's comment on Non offensive word for people who are not single-magisterium-Bayes thinkers by (
As the author the post you linked in the first paragraph, I may be able to provide some useful context, at least for that particular post.
Arguments for and against Strong Bayesianism have been a pet obsession of mine for a long time, and I’ve written a whole bunch about them over the years. (Not because I thought it was especially important to do so, just because I found it fun.) The result is that there are a bunch of (mostly) anti-Bayes arguments scattered throughout several years of posts on my tumblr. For quite a while, I’d had “put a bunch of that stuff in a single place” on my to-do list, and I wrote that post just to check that off my to-do list. Almost none of the material in there is new, and nothing in there would surprise anyone who had been keeping up with the Bayes-related posts on my tumblr. Writing the post was housekeeping, not nailing 95 theses on a church door.
As you might expect, I disagree with a number of the more specific/technical claims you’ve made in this post, but I am with you in feeling like these arguments are retreading old ground, and I’m at the point where writing more words on the internet about Bayes has mostly stopped being fun.
It’s also worth noting that my relation to the rationalist community is not very goal-directed. I like talking to rationalists, I do it all the time on tumblr and discord and sometimes in meatspace, and I find all the big topics (including AGI stuff) fun to talk about. I am not interested in pushing the rationalist community in one direction or another; if I argue about Bayes or AGI, it’s in order to have fun and/or because I value knowledge and insight (etc.) in general, not because I am worried that rationalists are “wasting time” on those things when they could be doing some other great thing I want them to do. Stuff like “what does it even mean to be a non-Bayesian rationalist?” is mostly orthogonal to my interests, since to me “rationalists” just means “a certain group of people whose members I often enjoy talking to.”
Thanks for your response. I did find your post very interesting and enjoyable to read.
Incidentally, it is mostly my worry that retreading old ground might be less valuable to the community, and that it might be useful to accept a common framework, not necessarily that anyone was arguing the rationalist community as a whole should move in a certain direction (in reverse momentum from wherever they were headed before), or accept a different framework. I’m probably more goal directed than most of the rationalist community is, but that could be due to an idealism that hasn’t had time to have been tempered yet.
I keep being surprised at how little rationalists care about what’s true.if you got something right first-time around there is no need to revisit it, if you didn’t there is.There is no general rule against revisiting.
On the contrary, I think rationalists are often overly hesitant to act at all, or pursue much of any concrete goals, until they have reached a quite high threshold of certainty about whether or not they are correct about those goals first. If rationalists really didn’t care about what’s true, you’d probably see a lot more aggressive risk taking by them. But our problem seems to be risk aversion, not recklessness.
Let mere rephrase it: I don’t see why you care so little about what is true. You are arguing for string Bayesianism on the ground that it would be nice if it worked, not on the grounds that it works.
I am arguing against tool-boxism, on the grounds that if it were accepted as true (I don’t think it can actually be true in a meaningful sense) you basically give up on the ability to converge on truth in an objective sense. Any kind of objective principles would not be tool-boxism.
It seems that those who feel that tool-boxism is false, seem to converge on Bayesianism as a set of principles, not that they are the full story, or that there are no other consequences or ways to extend them, but as a set of principles with no domain in which they can both be meaniningfully applied and where they give the wrong answer.
You need to distinguish between truth and usefulness. If the justification of using different tools is purely on the basis of efficiency (in the limit, being able to solve a problem at all), then nothing is implied about the ability to converge on truth. Toolbox-ism does not necessarily imply pluralism in the resulting maps. There is also a thing where people advocate the use of multiple theories with different content, leading to an overall pluralism/relativism, but in view of the usefulness/truth distinction that is a different thing.
If they are not the full story, then you need other tools. You are saying contradictory things. Sometimes you say Bayes is the only tool you need, sometimes you say it can only do one thing.
Not giving the wrong answer is not a sufficient criterion for giving the right answer. To get the right answer, you need to get the hypothesis that corresponds to reality, somehow, and you need to confirm it. Recall that Bayes does not give you any method for generating hypotheses, let alone one guaranteed to generate the one true on in an acceptable period of time. So Bayes does not guarantee truth—truth as correspondence, that is.
This sounds like you argue against it on the grounds that you don’t like a state of affairs where tool-boxism is true, so you assume it isn’t. This seems to me like motivated reasoning.
It’s structurally similar to the person who says they are believing in God because if God doesn’t exist that would mean that life is meaningless.
I don’t think it’s possible to have unmotivated reasoning. Nearly all reasoning begins by assuming a set of propositions, such as axioms, to be true, before following all the implications. If I believe objectivity is true, then I want to know what follows from it. Note that Cox’s theorem proceeds similarly, by forming a set of desiderata first, and then finding a set of rules that satisfies them. Do you not consider this chain of reasoning to be valid?
(If I strongly believed “life is meaningless” to be false, and I believed that “God does not exist implies life is meaningless” then concluding from those that God exists is logically valid. Whether or not the two first propositions are themselves valid is another question)
There’s motivation and there’s motivation. Bad motivation is when an object-level proposition is taken as the necessary output of an epistemological process, and the epistemology is chose to beg the question. Good motivation is avoiding question-begging in your epistemology.
One thing about that chain of reasoning is that it’s very unbayesian. We have catch-phrases like “0 and 1 aren’t probabilities”. Even if they are, how do you get your 1 as probability for the thesis of objectivity being true?
I guess this is a pretty subtle point to make, so I’ll try to state it more clearly again. Let’s assume tool-boxism is true in some deep ontological sense, such that, for any given problem in which we want to discover the truth, there are multiple sets of reasoning principles which each output a different answer. No one agrees on which principles are correct for each problem, everyone is guided by some combination of intuition, innate preferences, habit, tradition, culture, or whim. This is indeed the current situation in which we find ourselves, but if tool boxism is indeed true, then that suggests this is the best we can do, i.e., objectivity is false. Rationalists at least seem to posit that objectivity is true.
It also means that all reasoning is necessarily motivated reasoning, if it has to be guided by subjective preferences. But even if objectivity is true, motivated reasoning is still a valid intellectual process, and probably the only possible process until that objective set of reasoning principles is discovered fully. Note that Cox’s theorem is based on motivated reasoning, in the sense that a set of desiderata is established first, before trying to determine a set of principles that satisfy those desiderata.
This is a nearly universal form of reasoning, especially in science, where one tries to establish a set of laws that agree with things that are found empirically. I don’t know if it’s possible to disentangle preferences entirely from beliefs.
I keep being surprised when I see anyone at all act a little bit like they care about what’s true, including me.
Keeping in my tradition of telling people to be less confident...
I strongly agree that the world is built on logic that can be understood by the individual human mind. And I think it’s likely that there are simple principles for correct reasoning, which might lead to intelligence explosion. Yay to you for resisting backwards drift on that!
But maybe let’s not tie that to the idea that all correct reasoning must approximate Bayes. Ironically, LW is the best source for arguments why Bayesian probability is itself an approximation to some more precise theory of uncertainty (UDT, Absent-Minded Driver, Psy-Kosh’s problem, Counterfactual Mugging, etc) and the many problems that remain even then (nature of observation, nature of priors, logical uncertainty, etc). In the end, a theory of uncertainty doesn’t just have to be correct in itself, it must also accurately model uncertainty, so it’s tied up with what it means to be an agent. We haven’t even scratched the surface of that.
During a physics lecture on quantum mechanics I was in once, the professor stated that theories like quantum field theory, string theory, and other types of quantum gravity were contained within plain quantum mechanics, because all of them had to work within the quantum framework (in the sense that they were quantum mechanics with more assumptions added).
I wonder if something similar is true for Bayesian probability, and the theories like UDT, Logical induction and things like that. Do any of these extensions violate Bayesian principles, making them overlap with them rather than contain them?
I think they violate. The Absent-Minded Driver problem is the simplest example, constructed to violate the independence axiom of vNM. Logical induction also, because the only position fully compatible with Bayes is logical omniscience, and we want to model logical non-omniscience (not knowing all true theorems). To tell an agent what to do in a situation, we need a model of uncertainty for the agent in the situation, which can be as complex as the agent and the situation. Bayesian probability is more of a tractable limit case, like Newtonian mechanics or Nash equilibrium.
These are not violation of Bayesian probability. VNM rationality exists independently of Bayes, logical induction might be a coherent extension of Bayes probability where classical logic (which is the one presupposing omniscience) is not applicable, UDT similarly presupposes logical omniscience, counterfactual mugging is a problem of decision theory, not probability, etc.
Let’s keep Bayesian probability, decision theory, VNM rationality, classical logic, etc. all well separated.
If you separate Bayesian probability from decision theory, then it has no justification except self-consistency, and you can no longer say that all correct reasoning must approximate Bayes (which is the claim under discussion).
Sure it does. Haven’t you heard of Cox’s Theorem? It singles out (Bayesian) probability theory as the uniquely determined extension of propositional logic to handle degrees of certainty. There’s also my recent paper, “From Propositional Logic to Plausible Reasoning: A Uniqueness Theorem”
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fauthors.elsevier.com%2Fa%2F1VIqc%2CKD6ZCKMf&data=02%7C01%7C%7C12e6bb32616e4a953bb808d4bfe40576%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636344433443102669&sdata=9lY8lw3AEn8Hw5IuPxo2YPcLadVhyXR5b98rULWC8nE%3D&reserved=0
or
https://arxiv.org/abs/1706.05261
I guess the problematic assumption is that we want to assign degrees of certainty. That doesn’t hold in AMD-like situations. They require reasoning under uncertainty, but any reasoning based on degrees of certainty leads to the wrong answer.
Correct inference must approximate Bayes. Correct reasoning is inference + hypothesis generations / update + what counts as evidence?
Decision theories are concerned with the last piece of the puzzle.
If I’m wrong, please show me a not obviously wrong theory that violates Bayes theorem...
I think that bayes with resource constraints looks a lot like toolbox-ism.
Take for example the problem of reversing MD5 hashes. You could bayesianly update your probabilities of which original string goes to which hash, but computationally there is no uncertainty so there is no point storing probabilities, so you strip those out. Or you could just download the tool of a rainbow table and use that and not have to compute the hashes yourself at all.
Or having to compute the joint probability of two sets of data that comes in at different times. Say you always have the event A at time T, but what happened to B at T can come in immediately or days to weeks later. You might not be able to store all the information about A if it is happening quickly (e.g. network interface activity). So you have to chose which data to drop. Perhaps you age it out after a certain time or drop things that are usual, but they still stop you being able to bayesianly update properly when that B comes in. Bayes doesn’t say what you should drop. You should drop the unimportant stuff, but figuring out what is unimportant a priori seems implausible.
So I still use bayes upon occasion when it makes sense and still think AI could be highly risky. But I think I’m a toolbox-ist.
I agree. In particular I’ve noticed that a lots of frequentist methods can be described in terms of the Bayesian method by replacing a step where you take the expected value of some quantity by instead taking the worst case of this quantity. This improves the computational efficiency by avoiding a difficult integral. Of course we choose the worst case rather than the best case because humans are risk averse. This shows an interesting fact: Bayesian methods are the same for every agent, but when resource constraints force you away from Bayesian methods your inferences can end up depending on your utility function. (And because humans are risk averse the frequentist method often looks like it is being more responsible and conservative than the Bayesian method even though the Bayesian method will in fact always produce predictions with an optimal amount of risk-averseness if you use it with an accurate utility function.)
Are you a statistician?
If yes: what’s your favorite paper you wrote on Bayes?
If not: why are you telling experts what to do?
-1
A request for the “argument from authority” fallacy. Freethinkers discuss ideas directly on their merits, not on their author’s job description. A rationalist doesn’t ignore any evidence, of course (even authors’ job descriptions), but try to weight them accurately, okay?
Will do, thank you.
edit:
“Chances are, if you’re a statistics/data science practitioner with a few years of experience applying different techniques to different problems and different data sets, and you have some general intuitions about which techniques apply better to which domains, you’re probably doing this in a Bayesian way.”
Freethinkers who might know a little bit about statistics/data science would, presumably by your lights, stop reading here.
I have seen a lot of this type of overconfident stuff out of LW over the years. That is, in Bayesian terms, I already had a prior, and it already updated in a “read a textbook and stop blogging” direction.
In general, there is no good way for me to know what your prior is on the subject I’m going to write, unless I already knew you really well. But it’s unreasonable to expect me to know what all of my audiences’ priors are and try to write something that agrees with them. I don’t think it’s possible. I’m not optimizing for the argmax of the less wrong commenters anyway. I want to write something that’s probably wrong, so I can update from it if I need to. Your kind of responses, and saying “read a textbook and stop blogging” seems to go against the spirit of free thought and debate anyway. Personally, I think the tendency of people to respond in that way is probably why communities like these dissipate after a while. I wrote a post about this subject as well.
Why write about technical stuff you don’t really know very well? How did you expect that to go?
I find this attitude of “lets blog about stuff that’s probably wrong with the aim of updating later when people correct you” kind of a weird way to go. Why not just go and read directly about what you want to learn about?
Your way helps you, of course, but generates negative externalities for everybody else (who have to either spend time correcting you if they know better, or get misled by you if they don’t).
Wasn’t scholarship (e.g. reading stuff) one of the virtues?
Scholarship was mentioned as one of the virtues, but EY didn’t put it into practice much. People tend to learn by imitation.
At what point do you consider yourself to have read enough? At what point do you decide that you’ve read enough textbooks and now it’s ok to blog?
Also, I don’t see how this creates negative externalities on people. That kind of assumes a bizarre situation where people are either forced to read, or forced to respond to things. Apply that reasoning to basically the entire internet, social media, every day discussions with people, and you basically have to quarantine yourself from most of the world to avoid that risk. Or you conclude that all speech has to be meticulously curated so that there is very low risk of misleading, offending, upsetting, or otherwise wasting someone’s time.
“At what point do you consider yourself to have read enough?”
How about a single statistics class at a university. At that point one might appreciate the set of things one might not know about yet. In reality, though, I feel that if you want to blog about technical topics, you should be an expert on said technical topics. If you are not an expert, it seems you should listen, not talk.
“Also, I don’t see how this creates negative externalities on people.”
Conditional on you being wrong, you should expect no negative externalities only if you expect the activity of blogging to be akin to pissing into the wind—you don’t expect folks to take you seriously anyways. If you are not an expert yet, you should not be very confident about avoiding being wrong on technical topics.
If folks do take you seriously, they either get the wrong idea, or have to spend energy correcting you, or leave it alone, and let you mislead others.
Not sure why you would say this (assuming I haven’t even done that) and then immediately admit that you expect something much higher. What that level of expertise is I’m not sure, but probably having a Ph.D in statistics?
I have an undergraduate degree in math / physics, and I’ve been working at a data science job for 3 years, while spending most of my free time studying these subjects. I wouldn’t call myself an expert, but at least personally, I think I’ve reached a point where I can feasibly have discussions with people about statistics / ML, and not say things that are totally far off from where at least a certain mode of experts are on the subject.
Of course, the topic I was discussing is actually somewhere on the border of statistics, mathematics, and philosophy, and my guess is there are few academic programs that focus specifically on that overlapping region. That makes it very unlikely for anyone on this site to be at the level of expertise you demand. And if the subject is really that esoteric, it also makes it more unlikely that someone would somehow damagingly misuse what they read here. There are no infohazards (as far as I know) in my post, and there really aren’t any concrete suggestions for actions to take, either.
Maybe there is a cultural/generational difference here.
I have seen very little on Bayes out of LW over the years I agree with—take it as a datapoint if you wish. Most of it is somewhere between at least somewhat wrong and not even wrong.
Hanson had a post somewhere on how folks should practice holding strong opinions and arguing for them, but not taking the whole thing very seriously. Maybe that’s what you are doing.
There may indeed be a cultural difference here.
LessWrong has tended towards skepticism (though not outright rejection) of academic credentials ( consider Eliezer’s “argument trumps authority” discussions in the Sequences). However, this site is more or less a place for somewhat informal intellectual discussion. It is not an authoritative information repository, and as far as I can tell, does not claim to be. Anyone who participates in discussions here is probably well aware of this fact, and is fully expected to be able to consider the arguments here, not take them at face value.
If you disagree with some of the core ideas around this community (like Bayesian epistemology), as well as what you perceive to be the “negative externalities” of the tendency towards informal / non-expert discussion, then to me it seems likely that you disagree with certain aspects of the culture here. But you seem to have chosen to oppose those aspects, rather than simply choosing not to participate.
I don’t really have time to “oppose” in the sense you mean, as that’s a full time job. But for the record this aspect of LW culture is busted, I think.
“somewhat informal intellectual discussion”
All I am saying is, if you are going to talk about technical topics, either: (a) know what you are talking about, or (b) if you don’t or aren’t sure, maybe read more and talk less, or at least put disclaimers somewhere. That’s at least a better standard than what [university freshmen everywhere] are doing.
If you think you know what you are talking about, but then someone corrects you on something basic, heavily update towards (b).
I try to adhere to this also, actually—on technical stuff I don’t know super well. Which is a lot of stuff.
The kind of meaningless trash talk MrMind is engaged in above, I find super obnoxious.
But this is a philosophical position you’re taking. You’re not just explaining to us what common decency and protocol should dictate—you’re arguing for a particular conception of discourse norms you believe should be adopted. And probably, in this community, a minority position at that. But, the way that you have stated this comes across like you think your position is obvious, to the point where it’s not really worth arguing for. To me, it doesn’t seem so obvious. Moreover, if it’s not obvious, and if you were to follow your own guidelines fully, you might decide to leave that argument up to the professional, fully credentialed philosophers.
Anyway, what you are bringing up is worth arguing about in my opinion. LW may be credential-agnostic, but it also would be beneficial to have some way of knowing which arguments carry the most weight, and what information is deemed the most reliable—while also allowing people of all levels of expertise to discuss it freely. Such a problem is very difficult, but I think following your principle of “only experts talk, non-experts listen” is sort of extreme and not really appropriate outside of classrooms and lecture halls.
I am saying there is a very easy explanation on why the stats community moved on and LW is still talking about this: LW’s thinking on this is “freshman level.”
I don’t think “know what you are talking about” is controversial, but perhaps I am just old.
I think it’s ok for non-experts to talk, I just think they need to signal stuff appropriately. Wikipedia has a similar problem with non-expert and expert talk being confused, which is why it’s not seen as a reliable source on technical topics.
Being “credential-agnostic” is sort of being a bad Bayesian—you should condition on all available evidence if you aren’t certain of claims (and you shouldn’t be if you aren’t an expert). Argument only screens authority under complete certainty.
Non-experts may not know the boundary of their own knowledge, and may also have trouble knowing where the boundaries of the knowledge of others are as well.
In fact, I think that quite frequently even experts have trouble knowing the extent of their own expertise. You can find countless examples of academics weighing in on matters they aren’t really qualified for. I think this is a particularly acute problem in the philosophy of science. This is a problem I had a lot when I read books by authors of pop-sci / pop-philosophy. They sure seem like experts to the non-initiated. I attribute this mainly to them becoming disconnected from academia and living in a bubble containing mostly just them and their fans, who don’t offer much in the way of substantive disagreement. But this is one of the reasons I value discussion so highly.
When I began writing this post, I did not honestly perceive my level of knowledge to be at the “freshman” level. As I’ve mentioned before, many of the points are re-hashes of stuff from people like Jaynes, and although I might have missed some of his subtle points, is there any good way for me to know that he represents a minority or obsolete position without being deeply familiar with the aspects of that field, as someone with decades of experience would?
The simplest solution is just to read until I have that level of experience with the topics as measured by actual time spent on it, but I feel like that would come at the very high cost of not being able to participate in online discussions, which are valuable. But even then, I probably would still not know where my limits are until I bump into opposing views, which would need to occur through discussion.
Yes, absolutely. See also SMBC’s “send in the bishops, they can move diagonally” (chess masters on the Iraq war).
I don’t know if Jaynes represents a minority position (there are a lot of Bayesian statisticians). It’s more like the field moved on from this argument to more interesting arguments. Basically smart Bayesians and frequentists mostly understood each other’s arguments, and considered them mostly valid.
This is the type of B vs F argument people have these days (I linked this here before):
https://normaldeviate.wordpress.com/2012/08/28/robins-and-wasserman-respond-to-a-nobel-prize-winner/
If you really want the gory details, you can also read the Robins/Ritov paper. But it’s a hard paper.
Full disclosure: Robins was my former boss, and I am probably predisposed to liking his stuff.
Re: “what’s a good way to know”: I would say ask experts. Stat profs love talking about this stuff, you can email your local one, and try to go for coffee or something.
Re: “freshman level,” this was perhaps uncharitable phrasing. I just perceive, perhaps incorrectly, a lot of LW discussions as the type of discussion that takes place in dorms everywhere.
I skimmed this a bit, and it seems like the argument went several rounds but was never actually resolved in the end? See Chris Sim’s last comment here which Robins and Wasserman apparently never responded to. Also, besides this type of highly technical discussion, can you point us to some texts that explains the overall history and current state of the F vs B debate in the professional stats community? I’d like to understand how and why they moved on from the kinds of discussion that LW is still having.
There is a recent book Computer Age Statistical Inference by Efron and Hastie (who are well-respected statisticians). They start by distinguishing three kinds of statistics—frequentist (by which they mean Neyman and Pearson with some reliance on Fisher); Bayesian which everybody here knows well; and Fisherian by which they mean mostly maximum likelihood and derivatives. They say that Fisher, though the was dismissive of the Bayesian approach, didn’t fully embrace the frequentism either and blazed his own path somewhere in the middle.
The book is downloadable as a PDF via the link.
We can ask Chris and Larry (I can if/when I see them).
My take on the way this argument got resolved is that Chris and Larry/Jamie agree on the math—namely that to “solve” the example using B methods we need to have a prior that depends on pi. The possible source of disagreement is interpretational.
Larry and Jamie think that this is Bayesians doing “frequentist pursuit”, that is using B machinery to mimic a fundamentally F behavior. As they say, there is nothing wrong with this, but the B here seems extraneous. Chris probably doesn’t see it that way, he probably thinks this is the natural way to do this problem in a B way.
The weird thing about (what I think) Chris’ position here is that this example violates the “likelihood principle” some Bayesians like. The likelihood principle states that all information lives in the likelihood. Of course here the example is set up in such a way that the assignment probably pi(X) is (a) not a part of the likelihood and (b) is highly informative. The natural way for a Bayesian to deal with this is to stick pi(X) in the prior. This is formally ok, but kind of weird and unnatural.
How weird and unnatural it is is a matter of interpretation, I suppose.
This example is very simple, there are much more complicated versions of this. For example, what if we don’t know pi(X), but have to model it? Does pi(X) still go into the prior? That way lie dragons...
I guess my point is, these types of highly technical discussions are the discussions that professionals have if B vs F comes up. If this is too technical, may I ask why even get into this? Maybe this level of technicality is the natural point of technicality for this argument in this, the year of our Lord 2017? This is kind of my point, if you aren’t a professional, why are you even talking about this?
It’s a good question about a history text on B vs F. Let me ask around.
edit: re: dragons, I guess what I mean is, it seems most things in life can be phrased in F or B ways. But there are a lot of phenomena for which the B phrasing, though it exists, isn’t really very clarifying. These might include identification and model misspecification issues. In such cases the B phrasing just feels like carrying around ideological baggage.
My philosophy is inherently multiparadigm—you use the style of kung fu that yields the most benefit or the most clarity for the problem. Sometimes that’s B and sometimes that’s F and sometimes that’s something else. I guess in your language that would be “instrumental rationality in data analysis.”
I don’t think that having a conversation with someone who’s wrong is necessarily bad for myself. Arguing against someone who’s wrong can help me to clarify my own thoughts on a topic.
CFAR supports the notion that one of the best ways to learn is to teach. Mixing reading textbooks passively with active argument is good for learning a subject well.
That’s fine, but can OP at least preface with [Epistemic status: may not know what I am talking about]?
What did you expect with “Very partisan / opinionated”? I don’t think that’s how the average academic expert would preface his professional position if academics would be in the habit of stating the epistemic status.
I was not asking for a signal “I am not an academic.” I was asking for a signal “don’t take this too seriously, dear reader.”
There is a big difference between having strong opinions and being wrong.
I was hoping that “very partisan” would signal that I recognize there are a sizable chunk of people with very different views on the subject, and that recognition indicates some kind of epistemic humility. I was wrong about that, and in the future I’ll try to indicate that more explicitly.
The problem of how much knowledge is enough has an age old solution: academic credentials.
I think the real answer is about people’s motives.
Reading stuff without talking about it isn’t going to impress anyone, since they won’t even know.
Because “experts” are fucking it up left and right.
Ilya is a student and coauthor of Judea Pearl, whose work on causality and Bayes nets was cited by Eliezer many times. He’s an expert at the stuff that LW is amateuring in.
A: Ilya is a statistician.
B: Ilya is an expert in Bayes probability, and is never wrong.
So:
C: Every statistician is an expert in Bayes probability, and they are never wrong.
Corollary: the replication crysis is a conspiracy of the Bayes Shadow Government.
Psychologists are not statisticians, though. Generally they are relatively naive users of stats methods (as are a lot of other applied folks, e.g. doctors that publish, cognitive scientists, social scientists, epidemiologists, etc.) Ideally, methods folks and applied folks collaborate, but this does not always happen.
You can fish for positive findings with B methods just fine, the issue isn’t F vs B, the issue is bad publication incentives.
There is also a little bit of “there is a huge replication crisis on, long story short, we should read this random dude’s blog (with apologies to the OP).”
Pearl is, apparently, only half Bayesian.
I am wrong a lot—I can point you to some errors in my papers if you want.
The replication crysis is decomposable into many pieces, two of which are surely bad incentives and relative inexperience of the “applied folks”. Another though is, that’s the main point, that frequentist methods are a set of ad-hoc, poorly explained, poorly understood heuristics. No wonder that they are used improperly.
On the other hand, I’ve seen the crysis explained mostly by Bayesian statisticians, so I’m possibly in a bubble. If you can point me to a frequentist explanation I would be glad to pop it.
Apparently though, cousin_it thinks you cannot be criticized or argued against...
“Another though[t] is, that’s the main point, that frequentist methods are a set of ad-hoc, poorly explained, poorly understood heuristics.”
I don’t think so. This is what LW repeatedly gets wrong, and I am kind of tired of talking about it. How are you so confident re: what frequentist methods really are about, if you aren’t a statistician? This is incredibly bizarre to me.
Rather than argue about it constantly, which I am very very tired of doing (see above “negative externalities”), I can point you to Larry Wasserman’s book “All of Statistics.” It’s a nice frequentist book. Start there, perhaps. Larry is very smart, one of the smartest statisticians alive, I think.
My culture thrives on peer review, as much as we grumble about it. Emphasis on “peer,” of course.
You should probably be a bit more charitable to cousin_it, he’s very smart too.
I was under the impression that it was sufficient to read statistics books. Apparently though, you need also to be anointed by another statistician to even talk about the subject.
You seem to imply that no statistician has ever criticized frequentist methods. LW is just parroting what others, more expert men already said.
Isn’t it, as long as you’re making an incorrect statement, irrelevant how intelligent you are? Jaynes was wrong about quantum mechanics. Einstein was wrong about the unified field.
Everybody can be wrong, no matter how respected or intelligent they are.
“I was under the impression that it was sufficient to read statistics books.”
Ok, what have you read?
I am not the “blogging police,” I am just saying, based on past experience, that when people who aren’t statisticians talk about these issues, the result is very low quality. So low that it would have been better to stay silent. Statistics is a very mathematical field. These types of arguments are akin to “should we think about mathematics topologically or algebraically?”
“You seem to imply that no statistician has ever criticized frequentist methods.”
See “Tom Knight and the LISP machine”:
http://catb.org/jargon/html/koans.html
One of these koans is pretty Bayesian, actually, the one about tic-tac-toe.
“Isn’t it, as long as you’re making an incorrect statement, irrelevant how intelligent you are?”
Sure is, but how certain are you it’s incorrect? If uncertain, intelligence is useful information you should Bayes Theorem in.
And anyways, charity is about interpreting reasonably what people say.
The pretty standard Bayesian curriculum: De Finetti, Jaynes-Bretthorst, Sivia.
I love Lisp koans much more than I love Lisp… Anyway, it’s still a question of knowing a subject, not being part of a cabal.
Well, I prefer evidence to signalling: if the problems is only my tediousness, refusing to accept a settled argument, someone can simply point me to a paper, a blog post or a book saying “here, this shows clearly that the replication crysis happened for this reason, not because of the opaqueness of frequentist methods”. I am willing to update. I have done it in the past many times, I’m confident I can do this time too.
Here, all this “He is very intelligent! No, you are very intelligent!” is… sad.
I guess the natural question is—what about standard Frequentist curriculum? Lots of stuff is neither B or F in stats (for example the book my group and I are going through now).
“it’s still a question of knowing a subject”
Indeed. That’s exactly the point.
The most common way I see “fishing” manifest with Bayesian methods is changing the prior until you get the signal you want. In fact, the “clarity” of Bayesian machinery is even aiding and abetting this type of practice.
You say you are willing to update—don’t you find it weird that basically the only place people still talk about B vs F is here on LW? Professional statisticians moved on from this argument decades ago.
The charitable view is LW likes arguing about unsettled philosophy, but aren’t up to speed on what real philosophical arguments are in the field. (In my field, for example, one argument is about testability, and how much should causal models assume). The uncharitable view is LW is addicted to online wankery.
Let me retrace the steps of this conversation, so that we have at least a direction to move towards.
The OP argued that we keep a careful eye so that we don’t drift from Bayesianism as the only correct mathematical form of inference.
You try to silence him saying that if he is not a statistician, he should not talk about that.
I point out that those who routinely use frequentists statistics are commonly fucking it up (the disaster about the RDA of vitamin D is another easily mockable mistakes of frequentist statisticians).
The conversation then degenerates on dick-size measuring, only with IQ or academic credentials.
So, let me regroup what I believe to be true, so that specific parts of what I believe to be true can be attacked (but if it’s just: “you don’t have the credentials to talk about that” or “other intelligent people think differently”, please refrain).
1 the only correct foundation for inference and probability is Bayesian
2 Bayesian probability has a broader applicability than frequentist probability
3 basic frequentist statistics can be and should be reformulated from a Bayesian point of view
4 frequentist statistics is taught badly and applied even worse
5 point 4 bears a no small responsability in famous scientific mistakes
6 nor Bayesian or frequentist statiscs bound dishonest scientists
7 advanced statistics has much more in common with functional analysis and measure theory, so that whether it’s expressed in one or the other form is less important
8 LW has the merit of insisting on Bayes because frequentist statiscs, being the academic tradition, has a higher status, and no amount mistakes derived from it seems able to make a dent in its reputation
9 Bayes theorem is the basis of the first formally defined artificial intelligence
I hope this list can keep the discussion productive.
“The conversation then degenerates on dick-size measuring.”
“I hope this list can keep the discussion productive.”
Alright then, Bayes away!
Generic advice for others: the growth mindset for stats (which is a very hard mathematical subject) is to be more like a grad student, e.g. work very very hard and read a lot, and maybe even try to publish. Leave arguing about philosophy to undergrads.
This sounds a lot like the Neil Tyson / Bill Nye attitude of “science has made philosophy obsolete!”
I don’t agree with Tyson on this, I just think yall aren’t qualified to do philosophy of stats.
The Wikipedia page for replication crisis doesn’t mention frequentism or Bayesianism. The main reasons are more like the file drawer effect, publish or perish, etc. Of course an honest Bayesian wouldn’t be vulnerable to those, but neither would an honest frequentist.
Who else has said that science could and should be wholesale replaced by Bayes?
No one?
Also the point.
If I wanted to tell people what I thought they ought to do, I’d have written about decision theory instead. Depending on your decision theory, it might tell you to do something non Bayesian, because you might not have a Bayesian technique right in front of you, but maybe you have a good heuristic that you know from experience works well. All I’m saying is that, probably, your reasoning approximates Bayesian reasoning, even when the “methods” you are using don’t look Bayesian. The way you model those methods as a whole probably does though.
Even if I were writing about decision theory, I don’t really see why making an argument for a particular way of thinking is equivalent to “telling people what to do”, though. Everything that gets written on Less wrong are either arguments or proposals, never commands. Eliezer isnt a statistician either, and yet here we are on his site dedicated to trying to figure out the right way to think. Besides that, I’m pretty sure there are tons of low hanging fruit in my essay that you could easily argue against, without going directly to argument from authority.
I certainly agree with you that Eliezer isn’t a statistician. I may disagree with you on the implications of this.
“All I’m saying is that, probably, your reasoning approximates Bayesian reasoning, even when the “methods” you are using don’t look Bayesian.”
If by “my reasoning” you mean me as a human using my brain, I don’t really see in what sense this is true. I do lots of things with my brain that aren’t Bayesian. If by “my reasoning” you mean stuff I do with data as a statistician, that’s simply false. For example, stuff I do with influence functions has no Bayesian analogue at all.
edit: there is probably some way I could set up some semi-parametric influence function stuff in a Bayesian way—I am not sure.
Funny thing though. If Ilya ever used an argument from authority on me, I’d thank him and start thinking hard about where I went wrong. You’ve read the sequences, right? Remember the praise for Judea Pearl? Well, Ilya is his student and coauthor. If he disagrees with you, it’s strong evidence.
Not sure if this is the best characterization: much of LW’s stance towards Bayesianism always came from the Word of Eliezer, rather than through any thorough discussion and debate. I’d say that skepticism of Bayesianism within our community isn’t really “returning to subjects that have already been extensively discussed”, but rather as “subjecting foundational premises to the kind of criticism they need to undergo before people can trust them to be true, and before people really understand their extent and limitations”.
If I try to apply this to protein folding instead of intelligence, it sounds really strange.
Most people who make useful progress at protein folding appear to use a relatively tool-boxy approach. And they all appear to believe that quantum mechanics provides a very good theory of protein folding. Or it least it would be, given unbounded computing power.
Why is something similar not true for intelligence?
On LW we frequently invent new vocabulary in a way that’s confusing for outsiders. It seems to me like “One Magisterium Bayesianism” is a new term that’s not taken from anywhere and is likely relatively opaque.
Maybe it would make more sense to speak of Bayesian Monism?
Doesn’t “monism” pretty much mean belief in only one kind of thing rather than employing only one procedure for finding truth? I think calling the position described here “Bayesian Monism” would be actively misleading.
I think that tistanm doesn’t just advocates Bayesianism as a method but advocates that reality is shaped in a way that it’s basic nature is represented by probability.
This would be the exact opposite of what Bayesianisms says (that is, probability is an optimal epistemic construction).
No doubt, but that isn’t the same as monism. You could have a world made of many kinds of stuff in which Bayesian inference is optimal, or a world made of one kind of stuff in which Bayesian inference produces terrible results.
It has been extensively discussed, but a lot of people still think “Bayes is the One Epistemology to Rule them All” is the correct conclusion.
Shouldn’t you want to believe what is true, not what leads to some arbitrary end-point?
Same problem. If it is actually the case that there is more than one method of reasoning, why pretend otherwise?
So can Bayes. Just set your priors so high that you will never accumulate significant contradictory evidence durign your lifetime.
Everything is based on intuition, ultimately. (eg your “Qualitative correspondence with common sense.”).
That’s minor????
There is a very simple argument: if you need to supplement Bayes with a method of hypothesis generation then you are no longer using Bayes alone and you are therefore not even using Strong Bayes (NB:: strong) yourself.
That just means Jaynes did not believe in Bayes is the One Epistemology to Rule them All. So why do you?
That means, in the context of Cox’s theorem, a very specific set of primitive intuitions about plausibilities, not that everything is reduced to how one feels about something.
Yes and yes and no. Yes, because everything, not just Coxs theorem, is based on some foundational assumption, however much you try to eliminate unjustified propositions from your epistemology. Yes because having an epistemology with a few bedrock assumptions is not the same as deciding every damn thing with gut feelings. No, because the plausibility, to you, of a plausible assumption that you cannot otherwise justify is not that different to a feeling.
Well, that is true for any kind of axiom. In any case, it’s a finite set of simple intuitions that define the content of “common sense” for Cox’s theorem, so that if two people disagree, they can point exactly to the formulas they disagree on.
That’s rather the point. It saves time to assume something like that from the outset.
Which may lead to agreeing to disagree rather than convergence.
I’ve rarely seen disagreement on basic axioms. p → p seems to be rather uncontroversial, although it’s based on ‘intuition’.
On the other hand, that’s the purpose of deduction: reduce the need of intuition only to the smallest and least controversial set of assertions. This does not imply, as your original formulation seems to, that then intuition can be used for everything.
I never said anything of the kind. My point was that intuition is unavoidably involved in everything.
Then check out the controversy over Euclid’s fifth postulate, mathematical intuitionism, the Axiom of Choice, whether existence is a predicate, etc, etc.
Some would say that it’s based on truth tables,and defies intuition!
See the logical versus material implication controversy:-
http://www.askphilosophers.org/question/4103
On this I think we agree. I’ll just add that sometimes “intuitions” points to “a short mental calculation” and some other times to “a biased heuristic”. The fact that we don’t have access to which is which is the danger of accepting sentences like “intuition is the basis of everything”.
I would rather prefer two different words for the two different kind of intuitions, but there aren’t.
Yes, but there are also never been controversy on the first postulate… Some axioms are more basic than others. And indeed challenged axioms produce strong revolutions.
This I don’t know how to interpret. Truth table are useful as long as they agree on the axioms. Or one could say that truth tables are based on intuition...
It can mean either of those, but it can also mean an assumption you can neither prove nor do without.;
If what you want is convergence on objective truth, it is the existence of axioms that people don’t agree on that is the problem.
And pluralism. Intuitonistic and classical maths co-existing, Euclidean and non-Euclidean geometry co-exisitng.
Truth tables give you a set of logical functions, some of which resemble traditional logical connectives, such as “and” and “implies” to some extent. But only to some extent. The worry is that they don’t capture all the features of ordinary langauge usage.
One-Magisterium Bayes: A Defense of Tribalism
Did you read the post? It should be made pretty clear that I’m not advocating Bayesian fundamentalism (and I describe what I believe that means, and why it doesn’t really square with actually being Bayesian).
I side with you on this issue. It irks me all the time when the Bayesian foundations are vaguely criticized with an air of superiority, as if dismissing them is a sign of having transcended to some higher level of existence (neorationalists, I’m looking at you).
On the other hand, I could accept tool-boxing, in accordance to the principle of “one truth, many methods to find it” if and only if:
it effectively showed better results than the Bayesian methods
it wouldn’t suddenly forget the pluri-decennial findings on the fallibility of human intuitions.
On the other hand:
This is provably true: P(X|X) = 1.
P(X) = P(X /\ X) = P(X|X)P(X) ⇔ P(X|X) = 1.
That point was mostly referring to when you perform the “Bayesian update”, the rule you use can be either strict conditionalization (P(H) = P(H|E)), which assumes P(E) = 1, or Jeffreys’ conditionalization, (P(H) = P(H|E)P(E) + P(H|~E)P(~E)). The latter seems to be the most intuitively correct rule, but I guess there are some subtle issues with using that rule that I need to dive deeper into to really understand.
So if I extract an red ball from an urn, should I condition the probability of finding a black ball in the next turn on not having extracted a red ball?
Besides, P(H) is most definitely not equal to P(H|E). P(H) is on the other hand demonstrably equal to P(H|E)P(E)+P(H|-E)P(-E), the usual decomposition of unity. I think we are talking about two completely different things here.
I’m talking about the following issue, found at this link:
It feels to me like you argue from time to time against strawmen:
While probability extends basic logic it doesn’t extended advanced logic (predicate calculus) as David Chapman argues in Probability theory does not extend logic.
This seems to confuse the idea of having a useful method for hypothesis generation with having a perfect method for hypothesis generation.
Saying that you have one unified theory that can give you the correct hypothesis in every case without looking at all alternatives might violate P = NP. On the other hand P = NP doesn’t mean that there aren’t subproblems in which there’s an algorithm for finding a perfect or even good hypothesis.
If P ≠ NP that supports the tool box paradigm. Different tools will perform well for generating hypothesis in different domains and there’s no perfect unified theory.
It’s not required for arguing that tool box thinking is better to argue that it’s not possible to analyse and update beliefs with Bayesian thinking.
I’m not convonced that probability cannot be made to extend to predicate calculus. You need to interpret “for every” and “exists” as transfinite “and” and “or”, but they are not some other abstruse ingredients impossible to fit.
As far as Chapman describes the situations various mathematicians have put a lot of effort into trying to made a system that extends probability from predicate calculus but no one succeeded in creating a coherent system.
There are two ways to disagree with that: 1) Point to a mathematician who actually successfully modeled the extension. 2) Say that no mathematician really tried to do that.
I tend to lean on this. There has been work to fix and strenghten Cox’s theorem, as also to extend probability to arbitrary preorders or other categories. I’ve yet to see someone try to extend probability to, say, intuitionistic or modal logic.
There are two common types of strawmen arguments that I’ve encountered within this debate.
One is the strawman argument that Bayesians typically give against frequentists, where they show how a particular frequentist test gives the wrong answer on a particular problem, but a straightforward application of Bayes theorem gives the right answer. Frequentists easily counter that a wiser frequentist would have used a different test for this problem that gives the right answer.
The other strawman argument is the one anti-Bayesians make, where they chastise Bayesians for claiming they have the complete theory of rationality / epistemology and no more work needs to be done. This is obviously false, since no Bayesian has ever claimed this, not even Jaynes. A complete theory would need ways to represent hypotheses, and ways to generate them, and the axioms of probability do not make any additional assumptions about what a hypothesis is.
I’m still looking for a well posed inference problem, where a straightforward application of Bayesian principles gives the wrong answer, but a straightforward application of a different set of principles gets the right answer.
This seems a bit motte-and-bailey. In your post, you argue for Bayesianism as a theory of reasoning. Of course you can say that problems that you can’t solve well with Bayesianism aren’t well posed inference problems. Unfortunately, nature doesn’t care about posing well posed inference problems.
Even if Bayesianism is better for a small subject of reasoning problems that doesn’t imply that it’s good to reject tool-boxism.
Yep. If Bayes only does one thing. you need other tools to do the other jobs. Which, by the way, implies nothing about converging, or not, on truth.
Bayesian has more than on or meaning.
What you have there is a defence of the Jaynesian variety, but Yudkowsky is making much stronger claims. For instance he thinks Bayes can replace science, but you can’t replace science with inference alone.
Also, if Bayes is inference alone, it can’t be the sole basis of intelligence.
This is correct. Arguments against Bayesianism ultimately boil down to “it’s not enough for AGI”. And they are stupid, because nobody has ever said that it was. But then arguments in favor of Bayesianism boil down to “it’s True”. And they are stupid, because “True” is not quite the same as “useful”. I think this whole debate is pointless as there is very little the two sides disagree with, besides some wordings.
Having said that, I think the question “how to reason well” should be seen as equivalent to “how to build an AGI”, which probably places me on the anit-Bayesian side.
There are some equivalences, but there are also some differences. We already know some uncomputable methods that are optimal between all computable inference, and they are perfectly Bayesian.
But when it comes down to computable optimality, we are in high waters.
OP said: “If there is no unified theory of intelligence, we are led towards the view that recursive self-improvement is not possible, since an increase in one type of intelligence does not necessarily lead to an improvement in a different type of intelligence.”
I think that some forms of self-improvement (SI) could be done without recursivety. I created a list of around 30 types of SI, starting from accelerating hardware and up to creating better plans. Most of them are not naturally recursive.
If SI will produce limited 2 times improvement on each level, and will not use recusivity option, it still enough to create 2 power 30 improvement of the system, or around 1 billion times improvement.
(Below some back of envelop Fermi like estimation, so numbers are rather random, and are given just to illustrate the idea.)
It means that near-human level AI could reach the power of around 1 billion humans without the use of recursivity option. Power of 1 billion is probably more than total power of all human science, where around 50 million researchers work.
Such AI would outperform human science 20 times, and could be counted as superintelligence. Surely, with power is more than enough to kill everybody—or to solve most of important humanity problems.
Such self- improvement is reachable without the use of the understanding of the nature of intelligence and doesn’t depend on the assumption that such understanding is needed for SI. So we can’t use the agrument about messiness of intelligence as agument for AI safety.
I think that “recursive self-improvement” means that one improvement leads the AGI to be better at improving itself, not that it must use the same trick every time.
If accelerating hardware allows for better improvements over other dimensions, then better hardware is still part of recursive improvement.
Sure, but it is not easy to prove in each case. For example, if an AI increases its hardware speed two times, and buys two times more hardware, its total productivity would grow 4 times. But we can’t say that the first improvement was done because of the second.
However, if it got an idea that improving hardware is useful, it is a recursive act, as this idea helps further improvements. Moreover, it opens the field of other ideas, like the improvement of improvement. That is why I say that true recursivity is happening on ideas level, but not on the hardware level.
Can’t recursivity be a cycle containing hardware as a node?
As the resident lesswrong “Won’t someone think of the hardware” person this comment rubs me up the wrong way a fair bit.
First there is not a well defined thing as hardware speed. Hardware speed might refer to various things clock speed, operations per second, memory bandwidth, memory response times. Depending on what your task is, your productivity might be bottle necked by one of these things and not the other. Some things like memory response times are due to the speed of signals traversing the mother board and are hard to improve while we still have the separation of memory and processing power.
Getting twice the hardware might less than twice the improvement. If there is some serial process then amdhal’s law comes into effect. If the different nodes need to make sure they have a consistent view of something you need to add latency so that sufficient numbers of them can have a good state of the data with a consensus algorithm.
Your productivity might be bottle necked by external factors not processing power at all (not getting data fast enough). This is my main beef about the sped up people thought experiment. The world is moving glacially for them and data is coming in at a trickle.
If you are searching a space, and you add more compute you might be searching less promising areas with the new compute, so you might not get twice the productivity.
I really would not expect twice the compute to lead to twice the productivity except in the most embarrassingly parallel situation like computing hashes.
I think your greater point is weakened, but not by much. We have lots of problems trying to distribute and work on problems together, so human intelligence is not purely additive either.
Thanks for elaborating, I agree that accelerating hardware twice will not actually produce twice intelligence. I used this oversimplified example of hardware acceleration as an example of non-recursive self-improvment, and diminishing returns only underlines its non-recursive nature.