I do not see why any of Chapman’s examples cannot be given appropriate distributions and modeled in a Bayesian analysis just like anything else:
Dynamical chaos? Very statistically modelable, in fact, you can’t really deal with it at all without statistics, in areas like weather forecasting.
Inaccessibility? Very modelable; just a case of missing data & imputation. (I’m told that handling issues like censoring, truncation, rounding, or intervaling are considered one of the strengths of fully Bayesian methods and a good reason for using stuff like JAGS; in contrast, whenever I’ve tried to deal with one of those issues using regular maximum-likelihood approaches it has been… painful.)
Time-varying? Well, there’s only a huge section of statistics devoted to the topic of time-series and forecasts...
Sensing/measurement error? Trivial, in fact, one of the best cases for statistical adjustment (see psychometrics) and arguably dealing with measurement error is the origin of modern statistics (the first instances of least-squared coming from Gauss and other astronomers dealing with errors in astronomical measurement, and of course Laplace applied Bayesian methods to astronomy as well).
Model/abstraction error? See everything under the heading of ‘model checking’ and things like model-averaging; local favorite Bayesian statistician Andrew Gelman is very active in this area, no doubt he would be quite surprised to learn that he is misapplying Bayesian methods in that area.
One’s own cognitive/computational limitations? Not just beautifully handled by Bayesian methods + decision theory, but the former is actually offering insight into the former, for example “Burn-in, bias, and the rationality of anchoring”.
Pretty frequently (if you’ll pardon the pun). Almost all papers are written using non-Bayesian methods, people expect results in non-Bayesian terms, etc.
Besides that: I decided years ago (~2009) that as appealing as Bayesian approaches were to me, I should study ‘normal’ statistics & data analysis first—so I understood them and why I didn’t want to use them before I began studying Bayesian statistics. I didn’t want to wind up in a situation where I was some sort of Bayesian fanatic who could tell you how to do a Bayesian analysis but couldn’t explain what was wrong with the regular approach or why Bayesian approaches were better!
(I think I’m going to be switching gears relatively soon, though: I’m working with a track coach on modeling triple-jumping performance, and the smallness of the data suggests it’ll be a natural fit for a multilevel model using informative priors, which I’ll want to read Gelman’s textbook on, and that should be a good jumping off point.)
Agreed about chaos, missing data, time series, and noise, but I think the next is off the mark:
Model/abstraction error? See everything under the heading of ‘model checking’ and things like model-averaging; local favorite Bayesian statistician Andrew Gelman is very active in this area, no doubt he would be quite surprised to learn that he is misapplying Bayesian methods in that area.
He might be surprised to be described as applying Bayesian methods at all in that area. Model checking, in his view, is an essential part of “Bayesian data analysis”, but it is not itself carried out by Bayesian methods. The strictly Bayesian part—that is, the application of Bayes’ theorem—ends with the computation of the posterior distribution of the model parameters given the priors and the data. Model-checking must (he says) be undertaken by other means because the truth may not be in the support of the prior, a situation in which the strict Bayesian is lost. From “Philosophy and the practice of Bayesian statistics”, by Gelman and Shalizi (my emphasis):
In contrast, Bayesian statistics or “inverse probability”—starting with a prior distribution, getting data, and moving to the posterior distribution—is associated with an inductive approach of learning about the general from particulars. Rather than testing and attempted falsification, learning proceeds more smoothly: an accretion of evidence is summarized by a posterior distribution, and scientific process is associated with the rise and fall in the posterior probabilities of various models …. We think most of this received view of Bayesian inference is wrong.
...
To reiterate, it is hard to claim that the prior distributions used in applied work represent statisticians’ states of knowledge and belief before examining their data, if only because most statisticians do not believe their models are true, so their prior degree of belief in all of Θ is not 1 but 0.
If anyone’s itching to say “what about universal priors?”, Gelman and Shalizi say that in practice there is no such thing. The idealised picture of Bayesian practice, in which the prior density is non-zero everywhere, and successive models come into favour or pass out of favour by nothing more than updating from data by Bayes theorem, is, they say, unworkable.
The main point where we disagree with many Bayesians is that we do not see Bayesian methods as generally useful for giving the posterior probability that a model is true, or the probability for preferring model A over model B, or whatever.
They liken the process to Kuhnian paradigm-shifting:
In some way, Kuhn’s distinction between normal and revolutionary science is analogous to the distinction between learning within a Bayesian model, and checking the model as preparation to discard or expand it.
but find Popperian hypothetico-deductivism a closer fit:
In our hypothetico-deductive view of data analysis, we build a statistical model out of available parts and drive it as far as it can take us, and then a little farther. When the model breaks down, we dissect it and figure out what went wrong. For Bayesian models, the most useful way of figuring out how the model breaks down is through posterior predictive checks, creating simulations of the data and comparing them to the actual data. The comparison can often be done visually; see Gelman et al. (2003, ch. 6) for a range of examples. Once we have an idea about where the problem lies, we can tinker with the model, or perhaps try a radically new design. Either way, we are using deductive reasoning as a tool to get the most out of a model, and we test the model—it is falsifiable, and when it is consequentially falsified, we alter or abandon it.
For Gelman and Shalizi, model checking is an essential part of Bayesian practice, not because it is a Bayesian process but because it is a necessarily non-Bayesian supplement to the strictly Bayesian part: Bayesian data analysis cannot proceed by Bayes alone. Bayes proposes; model-checking disposes.
I’m not a statistician and do not wish to take a view on this. But I believe I have accurately stated their view. The paper contains some references to other statisticians who, they says are more in favour of universal Bayesianism, but I have not read them.
Model-checking must (he says) be undertaken by other means because the truth may not be in the support of the prior, a situation in which the strict Bayesian is lost.
Loath as I am to disagree with Gelman & Shalizi, I’m not convinced that the sort of model-checking they advocate such as posterior p-values are fundamentally and in principle non-Bayesian, rather than practical problems. I mostly agree with “Posterior predictive checks can and should be Bayesian: Comment on Gelman and Shalizi,‘Philosophy and the practice of Bayesian statistics’”, Kruschke 2013 - I don’t see why that sort of procedure cannot be subsumed with more flexible and general models in an ensemble approach, and poor fits of particular parametric models found automatically and posterior shifted to more complex but better fitting models. If we fit one model and find that it is a bad model, then the root problem was that we were only looking at one model when we knew that there were many other models but out of laziness or limited computations we discarded them all. You might say that when we do an informal posterior predictive check, what we are doing is a Bayesian model comparison of one or two explicit models with the models generated by a large multi-layer network of sigmoids (specifically <80 billion of them)… If you’re running into problems because your model-space is too narrow—expand it! Models should be able to grow (this is a common feature of Bayesian nonparametrics).
This may be hard in practice, but then it’s just another example of how we must compromise our ideals because of our limits, not a fundamental limitation on a theory or paradigm.
Expanding further on my previous reply, I believe that the claimed (by Gelman and Shalizi) non-Bayesian nature of model-checking is wrong: the truth is that everything that goes under the name of model-checking works, to the extent that it does, so far as it approximates the underlying Bayesian structure. It is not called Bayesian, because it is not an actual, numerical use of Bayes theorem, and the reason we are not doing that is because we do not know how: in practice we cannot work with universal priors.
So Bayesian ideas are applicable to the problem of model/abstraction error, but we cannot apply them numerically. In fact, that is pretty much what model/abstraction error means—if we did have numbers, they would be part of the model. Model checking is what we do when we cannot calculate any further with numerical probabilities.
Cf. my analogy here with understanding thermodynamics.
I believe that would be Eliezer’s response to Gelman and Shalizi. I would not expect them to be convinced though. Shalizi would probably dismiss the idea as moonshine and absurdity.
So if a mind is arriving at true beliefs, and we assume that the second law of thermodynamics has not been violated, that mind must be doing something at least vaguely Bayesian—at least one process with a sort-of Bayesian structure somewhere—or it couldn’t possibly work.
ETA: Why is the grandparent at −4? David Chapman and simplicio may be wrong about this, but neither are saying anything stupid, or so much thrashed out in the past as to not merit further words.
One’s own cognitive/computational limitations? Not just beautifully handled by Bayesian methods + decision theory,
Unless there’s been an enormous breakthrough in the past 2 years, I believe this is still a major unsolved problem. Also decision theory is about cooperating with other agents, not overcoming cognitive limitations.
Note that I was speaking of “Bayesianism” as practiced on LW, not of Bayesian statistics the academic field. I do not believe these are the same.
I think that’s absurd if that’s what he really means. Just because we are not daily posting new research papers employing model-averaging or non-parametric Bayesian statistics does not mean that we do not think those techniques are useful and incorporated in our epistemology or that we would consider the standard answers correct, and this argument can be applied to any area of knowledge that LWers might draw upon or consider correct. If we criticize p-values as a form of building knowledge, is that not a part of ‘Bayesian epistemology’ because we are drawing arguments from Jaynes or Ioannidis and did not invent them ab initio?
‘Your physics can’t deal with modeling subatomic interactions, and so sadly your entire epistemology is erroneous.’ ‘??? There’s a huge and extremely successful area of physics devoted to that, and I have no freaking idea what you are talking about. Are you really as ignorant and superficial as you sound like, in listing as a weakness something which is actually a major strength of the physics viewpoint?’ ‘Oh, but I meant physics as practiced on LessWrong! Clearly that other physics is simply not relevant. Come back when LW has built its own LHC and replicated all the standard results in the field, and then I’ll admit that particle physics as practiced on LW is the same thing as particle physics the academic field, because otherwise I refuse to believe they can be the same.’
I think you’re not being charitable again. Consider the difference between physics as practiced by quantum woo mystics, and physics as practiced by physicists or even engineers. I think that simplicio is referring to a similar (though less striking) tendency for the representative LWer to quasi-religiously misapply and oversell probability theory (which may or may not be the case, but should be argued with something other than uncharitable ridicule).
I think you may be extrapolating much too far from the quote I posted. Also, my statistics level is well below both yours and Chapman’s so I am not a good interlocutor for you.
I think you may be extrapolating much too far from the quote I posted.
I don’t think I am. It’s a very simple quote: “here is a list of n items Bayesian statistics and hence epistemology cannot handle; therefore, it cannot be right.” And it’s dead wrong because all n items are handled just fine.
I think you are being uncharitable. The list was of different types of uncertainty that Bayesians treat as the same, with a side of skepticism that they should be handled the same, not things you can’t model with bayesian epistemology.
The question is not whether Bayes can handle those different types of uncertainty, it’s whether they should be handled by a unified probability theory.
I think the position that we shouldn’t (or don’t yet) have a unified uncertainty model is wrong, but I don’t think it’s so stupid as to be worth getting heated about and being uncivil.
I think the position that we shouldn’t (or don’t yet) have a unified uncertainty model is wrong
Did somebody solve the problem of logical uncertainty while I wasn’t looking?
but I don’t think it’s so stupid as to be worth getting heated about and being uncivil.
I disagree that Gwern is being uncivil. I don’t think Chapman has any ground to criticize LW-style epistemology when he’s made it abundantly clear he has no idea what it is supposed to be. (Indeed, that’s his principal criticism: the people he’s talked to about it tell him different things.)
It’d be like if Berkeley asked a bunch of Weierstrass’ first students about their “supposed” fix for infinitesimals. Because the students hadn’t completely grasped it yet, they gave Berkeley a rope, a rubber hose, and a burlap sack instead of giving him the elephant. Then Berkeley goes and writes a sequel to the Analyst disparaging this “new Calculus” for being incoherent.
In that world, I think Berkeley’s the one being uncivil.
I do not see why any of Chapman’s examples cannot be given appropriate distributions and modeled in a Bayesian analysis just like anything else:
Dynamical chaos? Very statistically modelable, in fact, you can’t really deal with it at all without statistics, in areas like weather forecasting.
Inaccessibility? Very modelable; just a case of missing data & imputation. (I’m told that handling issues like censoring, truncation, rounding, or intervaling are considered one of the strengths of fully Bayesian methods and a good reason for using stuff like JAGS; in contrast, whenever I’ve tried to deal with one of those issues using regular maximum-likelihood approaches it has been… painful.)
Time-varying? Well, there’s only a huge section of statistics devoted to the topic of time-series and forecasts...
Sensing/measurement error? Trivial, in fact, one of the best cases for statistical adjustment (see psychometrics) and arguably dealing with measurement error is the origin of modern statistics (the first instances of least-squared coming from Gauss and other astronomers dealing with errors in astronomical measurement, and of course Laplace applied Bayesian methods to astronomy as well).
Model/abstraction error? See everything under the heading of ‘model checking’ and things like model-averaging; local favorite Bayesian statistician Andrew Gelman is very active in this area, no doubt he would be quite surprised to learn that he is misapplying Bayesian methods in that area.
One’s own cognitive/computational limitations? Not just beautifully handled by Bayesian methods + decision theory, but the former is actually offering insight into the former, for example “Burn-in, bias, and the rationality of anchoring”.
gwern, I am curious. You do a lot of practical data analysis. How often do you use non-Bayesian methods?
Pretty frequently (if you’ll pardon the pun). Almost all papers are written using non-Bayesian methods, people expect results in non-Bayesian terms, etc.
Besides that: I decided years ago (~2009) that as appealing as Bayesian approaches were to me, I should study ‘normal’ statistics & data analysis first—so I understood them and why I didn’t want to use them before I began studying Bayesian statistics. I didn’t want to wind up in a situation where I was some sort of Bayesian fanatic who could tell you how to do a Bayesian analysis but couldn’t explain what was wrong with the regular approach or why Bayesian approaches were better!
(I think I’m going to be switching gears relatively soon, though: I’m working with a track coach on modeling triple-jumping performance, and the smallness of the data suggests it’ll be a natural fit for a multilevel model using informative priors, which I’ll want to read Gelman’s textbook on, and that should be a good jumping off point.)
Random question—if you were to recommend a textbook or two, from frequentist and Bayesian analysis both, to a random interested undergraduate...
(As you might guess, not a hypothetical, unfortunately.)
Agreed about chaos, missing data, time series, and noise, but I think the next is off the mark:
He might be surprised to be described as applying Bayesian methods at all in that area. Model checking, in his view, is an essential part of “Bayesian data analysis”, but it is not itself carried out by Bayesian methods. The strictly Bayesian part—that is, the application of Bayes’ theorem—ends with the computation of the posterior distribution of the model parameters given the priors and the data. Model-checking must (he says) be undertaken by other means because the truth may not be in the support of the prior, a situation in which the strict Bayesian is lost. From “Philosophy and the practice of Bayesian statistics”, by Gelman and Shalizi (my emphasis):
...
If anyone’s itching to say “what about universal priors?”, Gelman and Shalizi say that in practice there is no such thing. The idealised picture of Bayesian practice, in which the prior density is non-zero everywhere, and successive models come into favour or pass out of favour by nothing more than updating from data by Bayes theorem, is, they say, unworkable.
They liken the process to Kuhnian paradigm-shifting:
but find Popperian hypothetico-deductivism a closer fit:
For Gelman and Shalizi, model checking is an essential part of Bayesian practice, not because it is a Bayesian process but because it is a necessarily non-Bayesian supplement to the strictly Bayesian part: Bayesian data analysis cannot proceed by Bayes alone. Bayes proposes; model-checking disposes.
I’m not a statistician and do not wish to take a view on this. But I believe I have accurately stated their view. The paper contains some references to other statisticians who, they says are more in favour of universal Bayesianism, but I have not read them.
Loath as I am to disagree with Gelman & Shalizi, I’m not convinced that the sort of model-checking they advocate such as posterior p-values are fundamentally and in principle non-Bayesian, rather than practical problems. I mostly agree with “Posterior predictive checks can and should be Bayesian: Comment on Gelman and Shalizi,‘Philosophy and the practice of Bayesian statistics’”, Kruschke 2013 - I don’t see why that sort of procedure cannot be subsumed with more flexible and general models in an ensemble approach, and poor fits of particular parametric models found automatically and posterior shifted to more complex but better fitting models. If we fit one model and find that it is a bad model, then the root problem was that we were only looking at one model when we knew that there were many other models but out of laziness or limited computations we discarded them all. You might say that when we do an informal posterior predictive check, what we are doing is a Bayesian model comparison of one or two explicit models with the models generated by a large multi-layer network of sigmoids (specifically <80 billion of them)… If you’re running into problems because your model-space is too narrow—expand it! Models should be able to grow (this is a common feature of Bayesian nonparametrics).
This may be hard in practice, but then it’s just another example of how we must compromise our ideals because of our limits, not a fundamental limitation on a theory or paradigm.
Expanding further on my previous reply, I believe that the claimed (by Gelman and Shalizi) non-Bayesian nature of model-checking is wrong: the truth is that everything that goes under the name of model-checking works, to the extent that it does, so far as it approximates the underlying Bayesian structure. It is not called Bayesian, because it is not an actual, numerical use of Bayes theorem, and the reason we are not doing that is because we do not know how: in practice we cannot work with universal priors.
So Bayesian ideas are applicable to the problem of model/abstraction error, but we cannot apply them numerically. In fact, that is pretty much what model/abstraction error means—if we did have numbers, they would be part of the model. Model checking is what we do when we cannot calculate any further with numerical probabilities.
Cf. my analogy here with understanding thermodynamics.
I believe that would be Eliezer’s response to Gelman and Shalizi. I would not expect them to be convinced though. Shalizi would probably dismiss the idea as moonshine and absurdity.
ETA: Eliezer on the subject:
ETA: Why is the grandparent at −4? David Chapman and simplicio may be wrong about this, but neither are saying anything stupid, or so much thrashed out in the past as to not merit further words.
Judging by the abstract I assume you meant to write, the latter is offering insight into the former?
Unless there’s been an enormous breakthrough in the past 2 years, I believe this is still a major unsolved problem. Also decision theory is about cooperating with other agents, not overcoming cognitive limitations.
Note that I was speaking of “Bayesianism” as practiced on LW, not of Bayesian statistics the academic field. I do not believe these are the same.
I believe Chapman is writing a more detailed critique of what he sees here; I will be sure to link you to it when it comes.
I think that’s absurd if that’s what he really means. Just because we are not daily posting new research papers employing model-averaging or non-parametric Bayesian statistics does not mean that we do not think those techniques are useful and incorporated in our epistemology or that we would consider the standard answers correct, and this argument can be applied to any area of knowledge that LWers might draw upon or consider correct. If we criticize p-values as a form of building knowledge, is that not a part of ‘Bayesian epistemology’ because we are drawing arguments from Jaynes or Ioannidis and did not invent them ab initio?
‘Your physics can’t deal with modeling subatomic interactions, and so sadly your entire epistemology is erroneous.’ ‘??? There’s a huge and extremely successful area of physics devoted to that, and I have no freaking idea what you are talking about. Are you really as ignorant and superficial as you sound like, in listing as a weakness something which is actually a major strength of the physics viewpoint?’ ‘Oh, but I meant physics as practiced on LessWrong! Clearly that other physics is simply not relevant. Come back when LW has built its own LHC and replicated all the standard results in the field, and then I’ll admit that particle physics as practiced on LW is the same thing as particle physics the academic field, because otherwise I refuse to believe they can be the same.’
I think you’re not being charitable again. Consider the difference between physics as practiced by quantum woo mystics, and physics as practiced by physicists or even engineers. I think that simplicio is referring to a similar (though less striking) tendency for the representative LWer to quasi-religiously misapply and oversell probability theory (which may or may not be the case, but should be argued with something other than uncharitable ridicule).
I think you may be extrapolating much too far from the quote I posted. Also, my statistics level is well below both yours and Chapman’s so I am not a good interlocutor for you.
I don’t think I am. It’s a very simple quote: “here is a list of n items Bayesian statistics and hence epistemology cannot handle; therefore, it cannot be right.” And it’s dead wrong because all n items are handled just fine.
I think you are being uncharitable. The list was of different types of uncertainty that Bayesians treat as the same, with a side of skepticism that they should be handled the same, not things you can’t model with bayesian epistemology.
The question is not whether Bayes can handle those different types of uncertainty, it’s whether they should be handled by a unified probability theory.
I think the position that we shouldn’t (or don’t yet) have a unified uncertainty model is wrong, but I don’t think it’s so stupid as to be worth getting heated about and being uncivil.
Did somebody solve the problem of logical uncertainty while I wasn’t looking?
I disagree that Gwern is being uncivil. I don’t think Chapman has any ground to criticize LW-style epistemology when he’s made it abundantly clear he has no idea what it is supposed to be. (Indeed, that’s his principal criticism: the people he’s talked to about it tell him different things.)
It’d be like if Berkeley asked a bunch of Weierstrass’ first students about their “supposed” fix for infinitesimals. Because the students hadn’t completely grasped it yet, they gave Berkeley a rope, a rubber hose, and a burlap sack instead of giving him the elephant. Then Berkeley goes and writes a sequel to the Analyst disparaging this “new Calculus” for being incoherent.
In that world, I think Berkeley’s the one being uncivil.