I’m now going to badly stereotype this conversation in the form I feel like I’ve seen it many times previously, including e.g. in the discussion of p-values and frequentist statistics. On this stereotypical depiction, there is a dichotomy between the thinking of Msr. Toolbox and Msr. Lawful that goes like this:
Msr. Toolbox: “It’s important to know how to use a broad variety of statistical tools and adapt them to context. The many ways of calculating p-values form one broad family of tools; any particular tool in the set has good uses and bad uses, depending on context and what exactly you do. Using likelihood ratios is an interesting statistical technique, and I’m sure it has its good uses in the right contexts. But it would be very surprising if that one weird trick was the best calculation to do in every paper and every circumstance. If you claim it is the universal best way, then I suspect you of blind idealism, insensitivity to context and nuance, ignorance of all the other tools in the toolbox, the sheer folly of callow youth. You only have a hammer and no real-world experience using screwdrivers, so you claim everything is a nail.”
Msr. Lawful: “On complex problems we may not be able to compute exact Bayesian updates, but the math still describes the optimal update, in the same way that a Carnot cycle describes a thermodynamically ideal engine even if you can’t build one. You are unlikely to find a superior viewpoint that makes some other update even more optimal than the Bayesian update, not without doing a great deal of fundamental math research and maybe not at all. We didn’t choose that formalism arbitrarily! We have a very broad variety of coherence theorems all spotlighting the same central structure of probability theory, saying variations of ‘If your behavior cannot be viewed as coherent with probability theory in sense X, you must be executing a dominated strategy and shooting off your foot in sense Y’.”
I currently suspect that when Msr. Law talks like this, Msr. Toolbox hears “I prescribe to you the following recipe for your behavior, the Bayesian Update, which you ought to execute in every kind of circumstance.”
This also appears to me to frequently turn into one of those awful durable forms of misunderstanding: Msr. Toolbox doesn’t see what you could possibly be telling somebody to do with a “good” or “ideal” algorithm besides executing that algorithm.
I disagree that this answers my criticisms. In particular, my section 7 argues that it’s practically unfeasible to even write down most practical belief / decision problems in the form that the Bayesian laws require, so “were the laws followed?” is generally not even a well-defined question.
To be a bit more precise, the framework with a complete hypothesis space is a bad model for the problems of interest. As I detailed in section 7, that framework assumes that our knowledge of hypotheses and the logical relations between hypotheses are specified “at the same time,” i.e. when we know about a hypothesis we also know all its logical relations to all other hypotheses, and when we know (implicitly) about a logical relation we also have access (explicitly) to the hypotheses it relates. Not only is this false in many practical cases, I don’t even know of any formalism that would allow us to call it “approximately true,” or “true enough for the optimality theorems to carry over.”
(N.B. as it happens, I don’t think logical inductors fix this problem. But the very existence of logical induction as a research area shows that this is a problem. Either we care about the consequences of lacking logical omniscience, or we don’t—and apparently we do.)
It’s sort of like quoting an optimality result given access to some oracle, when talking about a problem without access to that oracle. If the preconditions of a theorem are not met by the definition of a given decision problem, “meet those preconditions” cannot be part of a strategy for that problem. “Solve a different problem so you can use my theorem” is not a solution to the problem as stated.
Importantly, this is not just an issue of “we can’t do perfect Bayes in practice, but if we were able, it’d be better.” Obtaining the kind of knowledge representation assumed by the Bayesian laws has computational / resource costs, and in any real decision problem, we want to minimize these. If we’re handed the “right” knowledge representation by a genie, fine, but if we are talking about choosing to generate it, that in itself is a decision with costs.
Let’s zoom in on the oracle problem since it seems to be at the heart of the issue. You write:
It’s sort of like quoting an optimality result given access to some oracle, when talking about a problem without access to that oracle. If the preconditions of a theorem are not met by the definition of a given decision problem, “meet those preconditions” cannot be part of a strategy for that problem.
Here, it seems like you are doing something like interpreting a Msr. Law statement of the form “this Turing machine that has access to a halting oracle decides provability in PA” as a strategy for deciding provability in PA (as Eliezer’s stereotype of Msr. Toolbox would). But the statement is true independent of whether it corresponds to a strategy for deciding provability in PA, and the statement is actually useful in formal logic. Obviously if you wanted to design an automated theorem prover for PA (applicable to some but not all practical problems) you would need a different strategy, and the fact that some specific Turing machine with access to a halting oracle decides provability in PA might or might not be relevant to your strategy.
I agree that applying Bayes’ law as stated has resource costs and requires formally characterizing the hypothesis space, which is usually (but not always) hard in practice. The consequences of logical non-omniscience really matter, which is one reason that Bayesianism is not a complete epistemology.
I don’t disagree with any of this. But if I understand correctly, you’re only arguing against a very strong claim—something like “Bayes-related results cannot possibly have general relevance for real decisions, even via ‘indirect’ paths that don’t rely on viewing the real decisions in a Bayesian way.”
I don’t endorse that claim, and would find it very hard to argue for. I can imagine virtually any mathematical result playing some useful role in some hypothetical framework for real decisions (although I would be more surprised in some cases than others), and I can’t see why Bayesian stuff should be less promising in that regard than any arbitrarily chosen piece of math. But “Bayes might be relevant, just like p-adic analysis might be relevant!” seems like damning with faint praise, given the more “direct” ambitions of Bayes as advocated by Jaynes and others.
Is there a specific “indirect” path for the relevance of Bayes that you have in mind here?
At a very rough guess, I think Bayesian thinking is helpful in 50-80% of nontrivial epistemic problems, more than p-adic analysis.
How might the law-type properties be indirectly relevant? Here are some cases:
In game theory it’s pretty common to assume that the players are Bayesian about certain properties of the environment (see Bayesian game). Some generality is lost by doing so (after all, reasoning about non-Bayesian players might be useful), but, due to the complete class theorems, less generality is lost than one might think, since (with some caveats) all policies that are not strictly dominated are Bayesian policies with respect to some prior.
Sometimes likelihood ratios for different theories with respect to some test can be computed or approximated, e.g. in physics. Bayes’ rule yields a relationship between the prior and posterior probability. Even in the absence of a way to determine what the right prior for the different theories is, if we can form a set of “plausible” priors (e.g. based on parsimony of the different theories and existing evidence), then Bayes’ rule then yields a set of “plausible” posteriors, which can be narrow even if the set of plausible priors was broad.
Bayes’ rule implies properties about belief updates such as conservation of expected evidence. If I expect my beliefs about some proposition to update in a particular direction in expectation, then I am expecting myself to violate Bayes’ rule, which implies (by CCT) that, if the set of decision problems I might face is sufficiently rich, I expect my beliefs to yield some strictly dominated decision rule. It is not clear what to do in this state of knowledge, but the fact that my decision rule is currently strictly dominated does imply that I am somewhat likely to make better decisions if I think about the structure of my beliefs, and where the inconsistency is coming from. (In effect, noticing violations of Bayes’ rule is a diagnostic tool similar to noticing violations of logical consistency)
I do think that some advocacy of Bayesianism has been overly ambitious, for the reasons stated in your post as well as those in this post. I think Jaynes in particular is overly ambitious in applications of Bayesianism, such as in recommending maximum-entropy models as an epistemological principle rather than as a useful tool. And I think this post by Eliezer (which you discussed) overreaches in a few ways. I still think that “Strong Bayesianism” as you defined it is a strawman, though there is some cluster in thoughtspace that could be called “Strong Bayesianism” that both of us would have disagreements with.
(as an aside, as far as I can tell, the entire Ap section of Jaynes’s Probability Theory: The Logic of Science is logically inconsistent)
If I had to run with the toolbox analogy, I would say that I think of Bayes not as a hammer but more of a 3D printer connected to a tool generator program.
Fasteners are designed to be used with a specific tool (nail & hammer, screw & screwdriver). Imagine, rather, you are trying to use an alien’s box of fasteners – none of them are designed to work with your tools. You might be able to find a tool which works ok, perhaps even pretty well. This is more like the case with a standard statistics toolbox.
If you enter the precise parameters of any fastener (even alien fasteners, whitworth bolts and torx screws) into your tool generator program then your 3D printer will produce the perfect tool for fastening it.
Depending on how well you need to do the job, you might use a similar tool and not bother 3D printing out a completely new tool every time. However, the fastening job will only be done as well as the tool you are using matches the perfect 3D printed tool. Understanding how your tool differs from the perfect tool helps you analyse how well fastened your work will be.
At times you may not know all of the parameters to put into the tool generator but estimating the parameters as best you can will still print (in expectation) a better tool than just picking the best one you can find in your toolbox.
This seems a little forced as an analogy but it’s the best I can come up with!
“Strong Bayesianism” is a strawman already addressed in Toolbox-thinking and Law-thinking. Excerpt:
I disagree that this answers my criticisms. In particular, my section 7 argues that it’s practically unfeasible to even write down most practical belief / decision problems in the form that the Bayesian laws require, so “were the laws followed?” is generally not even a well-defined question.
To be a bit more precise, the framework with a complete hypothesis space is a bad model for the problems of interest. As I detailed in section 7, that framework assumes that our knowledge of hypotheses and the logical relations between hypotheses are specified “at the same time,” i.e. when we know about a hypothesis we also know all its logical relations to all other hypotheses, and when we know (implicitly) about a logical relation we also have access (explicitly) to the hypotheses it relates. Not only is this false in many practical cases, I don’t even know of any formalism that would allow us to call it “approximately true,” or “true enough for the optimality theorems to carry over.”
(N.B. as it happens, I don’t think logical inductors fix this problem. But the very existence of logical induction as a research area shows that this is a problem. Either we care about the consequences of lacking logical omniscience, or we don’t—and apparently we do.)
It’s sort of like quoting an optimality result given access to some oracle, when talking about a problem without access to that oracle. If the preconditions of a theorem are not met by the definition of a given decision problem, “meet those preconditions” cannot be part of a strategy for that problem. “Solve a different problem so you can use my theorem” is not a solution to the problem as stated.
Importantly, this is not just an issue of “we can’t do perfect Bayes in practice, but if we were able, it’d be better.” Obtaining the kind of knowledge representation assumed by the Bayesian laws has computational / resource costs, and in any real decision problem, we want to minimize these. If we’re handed the “right” knowledge representation by a genie, fine, but if we are talking about choosing to generate it, that in itself is a decision with costs.
As a side point, I am also skeptical of some of the optimality results.
Let’s zoom in on the oracle problem since it seems to be at the heart of the issue. You write:
Here, it seems like you are doing something like interpreting a Msr. Law statement of the form “this Turing machine that has access to a halting oracle decides provability in PA” as a strategy for deciding provability in PA (as Eliezer’s stereotype of Msr. Toolbox would). But the statement is true independent of whether it corresponds to a strategy for deciding provability in PA, and the statement is actually useful in formal logic. Obviously if you wanted to design an automated theorem prover for PA (applicable to some but not all practical problems) you would need a different strategy, and the fact that some specific Turing machine with access to a halting oracle decides provability in PA might or might not be relevant to your strategy.
I agree that applying Bayes’ law as stated has resource costs and requires formally characterizing the hypothesis space, which is usually (but not always) hard in practice. The consequences of logical non-omniscience really matter, which is one reason that Bayesianism is not a complete epistemology.
I don’t disagree with any of this. But if I understand correctly, you’re only arguing against a very strong claim—something like “Bayes-related results cannot possibly have general relevance for real decisions, even via ‘indirect’ paths that don’t rely on viewing the real decisions in a Bayesian way.”
I don’t endorse that claim, and would find it very hard to argue for. I can imagine virtually any mathematical result playing some useful role in some hypothetical framework for real decisions (although I would be more surprised in some cases than others), and I can’t see why Bayesian stuff should be less promising in that regard than any arbitrarily chosen piece of math. But “Bayes might be relevant, just like p-adic analysis might be relevant!” seems like damning with faint praise, given the more “direct” ambitions of Bayes as advocated by Jaynes and others.
Is there a specific “indirect” path for the relevance of Bayes that you have in mind here?
At a very rough guess, I think Bayesian thinking is helpful in 50-80% of nontrivial epistemic problems, more than p-adic analysis.
How might the law-type properties be indirectly relevant? Here are some cases:
In game theory it’s pretty common to assume that the players are Bayesian about certain properties of the environment (see Bayesian game). Some generality is lost by doing so (after all, reasoning about non-Bayesian players might be useful), but, due to the complete class theorems, less generality is lost than one might think, since (with some caveats) all policies that are not strictly dominated are Bayesian policies with respect to some prior.
Sometimes likelihood ratios for different theories with respect to some test can be computed or approximated, e.g. in physics. Bayes’ rule yields a relationship between the prior and posterior probability. Even in the absence of a way to determine what the right prior for the different theories is, if we can form a set of “plausible” priors (e.g. based on parsimony of the different theories and existing evidence), then Bayes’ rule then yields a set of “plausible” posteriors, which can be narrow even if the set of plausible priors was broad.
Bayes’ rule implies properties about belief updates such as conservation of expected evidence. If I expect my beliefs about some proposition to update in a particular direction in expectation, then I am expecting myself to violate Bayes’ rule, which implies (by CCT) that, if the set of decision problems I might face is sufficiently rich, I expect my beliefs to yield some strictly dominated decision rule. It is not clear what to do in this state of knowledge, but the fact that my decision rule is currently strictly dominated does imply that I am somewhat likely to make better decisions if I think about the structure of my beliefs, and where the inconsistency is coming from. (In effect, noticing violations of Bayes’ rule is a diagnostic tool similar to noticing violations of logical consistency)
I do think that some advocacy of Bayesianism has been overly ambitious, for the reasons stated in your post as well as those in this post. I think Jaynes in particular is overly ambitious in applications of Bayesianism, such as in recommending maximum-entropy models as an epistemological principle rather than as a useful tool. And I think this post by Eliezer (which you discussed) overreaches in a few ways. I still think that “Strong Bayesianism” as you defined it is a strawman, though there is some cluster in thoughtspace that could be called “Strong Bayesianism” that both of us would have disagreements with.
(as an aside, as far as I can tell, the entire Ap section of Jaynes’s Probability Theory: The Logic of Science is logically inconsistent)
If I had to run with the toolbox analogy, I would say that I think of Bayes not as a hammer but more of a 3D printer connected to a tool generator program.
Fasteners are designed to be used with a specific tool (nail & hammer, screw & screwdriver). Imagine, rather, you are trying to use an alien’s box of fasteners – none of them are designed to work with your tools. You might be able to find a tool which works ok, perhaps even pretty well. This is more like the case with a standard statistics toolbox.
If you enter the precise parameters of any fastener (even alien fasteners, whitworth bolts and torx screws) into your tool generator program then your 3D printer will produce the perfect tool for fastening it.
Depending on how well you need to do the job, you might use a similar tool and not bother 3D printing out a completely new tool every time. However, the fastening job will only be done as well as the tool you are using matches the perfect 3D printed tool. Understanding how your tool differs from the perfect tool helps you analyse how well fastened your work will be.
At times you may not know all of the parameters to put into the tool generator but estimating the parameters as best you can will still print (in expectation) a better tool than just picking the best one you can find in your toolbox.
This seems a little forced as an analogy but it’s the best I can come up with!