It seems like a consistent element of everyone’s explanations so far has been that the bayesian has reasonable priors and is not naive. I’ll respond to everyone at once here (hope that’s okay) to keep things organized. cc @michaeldickens, @cole-wyeth, @gavin-runeblade, @thomas-castriensis (thanks everyone for the explanations so far).
edit: IDK how to mention/tag people :(
One thing I notice, the Bayesian needs a prior for ‘unknown scam’ but shouldn’t they also consider the inverse? Like an unknown bank error in your favor type situation. I suppose that if the prior for that is small enough it doesn’t matter so it’s ignored as insignificant. Is the significance a matter of the ratio between priors of unknowns? Like it’s way more likely to be an unknown scam than an unknown bank error, so we ignore the bank error.
Something bigger doesn’t quite sit right with me: I don’t think having priors and not being naive is our default state for knowledge frontiers (hence the initial framing and restrictions on knowledge of scams). Rather, the default is that we are missing priors (or they’re wildly inaccurate) and we are naive to the ways in which we are naive (there is a lot that we don’t know we don’t know).
Here’s a concrete example: consider a Bayesian in 1901 (before general relativity). Almost all measurements of planetary motion agreed with Newton’s universal gravity (UG) and Mercury’s orbit was our main (only?) counterexample. We had multiple (incorrect) theories about how Mercury’s orbit might be perturbed by this body or that. The most recently discovered planet at the time (Neptune) was predicted via UG (0.125 of all known planets).
My understanding is that the Bayesian will, upon seeing measurements of all the planets’ orbits, update based on successful predictions. Most of the predictions are correct for UG alone, and all of them are correct for UG + New Mystery Planet. Let’s call these hypotheses and and we also have for all other UG based theories.
Now, based on the reasoning used (if I understand it correctly), the Bayesian already has reasonable priors (at least they believe so), and the general relativity hypothesis (playing the role of the scam hypothesis) is already accounted for in the Bayesian’s calculations and updates. It doesn’t matter that the Bayesian is unaware of how it works or not. [Unsure] The probability of GR being true is independent of whether the Bayesian knows about it or not[1]. Let’s call all the other hypotheses (excluding any UG hypotheses) just .
Okay so in case of observing something uncontroversial, like Mars data, , , and since lots of UG+something theories are inconsistent, similarly for the same reason (there are many (infinite) hypotheses in that are inconsistent and infinitely many hypotheses that are consistent). So we update in favor of and and decrease and .
Then, when observing Mercury, we see , , , and . So we update to increase , and decrease , and (which takes the biggest hit).
Have I gone wrong somewhere? I’m not confident that I’m at the right midstate here. If there isn’t a problem with my logic then this result feels intuitively wrong because we’re getting more confident in and no amount of observing will allow (containing GR) to catch up.
Also it seems like pretty quickly decouples from the probability of the next planet we find being via prediction (like Neptune), so we can ignore it after a bit when updating .
that’s how it seems it must be to me. I can’t quite put my finger on it, but it feels wrong that learning about an idea independently of all observations of data somehow updates things retroactively.
Bayesianism has no rules for what someone priors should be. It has rules about how to progress from a state of having priors.
There’s a reason that Einstein did not get his Nobel Prize for the special theory of relativity. At the time the prize was given, the Nobel Prize committee did not believe that the predictions about Mercuries orbit were strong enough evidence for the special theory of relativity to give him the Nobel Prize for it. Besides Mercuries orbit there was also the Michelson-Morley experiment.
To the extend that you were taught in school that the Michelson-Morley and Mercuries orbit provided definite evidence for the special theory of relativity, that’s a retrospective accounting from people who already knew it to be true and not the perspective from the physicists that gave Einstein his Nobel Prize.
Whether or not physicists at the time should have updated more strongly into the direction of seeing special relativity as proven depends a lot about what you believe of the merits of alternative explanations for the observations and how well those fitted the data and how likely you consider the measurement for Mercuries orbit to be correct.
[Unsure] The probability of GR being true is independent of whether the Bayesian knows about it or not
In Bayesanism probabilities are not independent of the model of the observer the same way that frequentism has a notion of observer independent probabilities.
The general mechanism with the mail order scam you talked about is called survivorship bias. If you take the question of how high the existential risk of being nuclear war happens to be, this matters. If you just observe that we now have nuclear weapons for a long time, it might be wrong to update with each passing year into a lower chance of nuclear war because you would not be around to observe reality in case everyone got killed by nuclear war. That’s why we need to look at Petrov and Arkhipov to get an understanding about near misses and we treat both of them as heroes on LessWrong.
[Unsure] The probability of GR being true is independent of whether the Bayesian knows about it or not
In Bayesanism probabilities are not independent of the model of the observer the same way that frequentism has a notion of observer independent probabilities.
Sure, but that’s not exactly what I’m asking about.
From your perspective, you have a prior P(bayesianism). I’ll call this P(B) for short. There’s also some prior P(not bayesianism) that covers all other epistemologies, both those which you know about and those you do not. Does P(B) change:
when you learn of other epistemologies? (I think this breaks up a bayesian’s hypothesis space but there’s no new data)
when you learn about other epistemologies? (Like details about how they work)
when you learn that you had an obstructive misunderstanding about other epistemologies? (Like, you thought it worked like X but actually it was Y)
other events?
You can replace P(B) with P(GR) if you like and ‘epistemologies’ with ‘theories of gravity’. The underlying thing I’m unsure about is where the update to the prior happens and why. If it is when you learn about the new theory, then it seems like an admission that the P(other) option is just outright wrong, so working backwards I guess bayesianism stays consistent by not updating the prior in response to learning a new idea.
Sorry for the delayed replay. My comments are being held for review.
I think you need to make a clearer distinction between a hypothetical perfect bayesian reasoner, who would know all possible hypotheses from the beginning and only narrow them down through observation, and the things humans do to try to approximate that, which will sometimes involve going back and changing the prior when a new hypothesis has been thought of.
[Unsure] The probability of GR being true is independent of whether the Bayesian knows about it or not[1]
Keep in mind that these “probabilities” are subjective assessments of probability based on an individual’s prior knowledge, not facts about reality. Two Bayesians with different prior experience may disagree about how probable something is (/seems to them), but reality will not disagree or debate with itself about the truth of the matter, or assign probability to different possibilities (mumble mumble I don’t really understand quantum mechanics and am pretending it doesn’t matter for the purpose of this conversation).
Whether or not General Relativity is true is unaffected by any probabilities any Bayesian may put on its truth or falsehood when reasoning about the evidence they’ve seen so far. But whether or not a particular Bayesian finds General Relativity to be probably true, is definitely affected by whether they know about it or not. Keep a clear distinction in your mind between “the probability a Bayesian reasoner assigns to some fact being true” and “whether or not that fact is true in reality”—these are not the same.
This is a “map vs. territory” distinction. Bayesian probabilities go into a mental model of how the world works (map), while how the world actually works is separate from the map.
Keep in mind that these “probabilities” are subjective assessments of probability based on an individual’s prior knowledge, not facts about reality. Two Bayesians with different prior experience may disagree about how probable something is (/seems to them),
Doesn’t that mean we should expect that Bayesians often disagree and they have no way to resolve it except consulting reality (i.e., an experiment)?
If that’s the case, why bother with Bayesianism at all? It seems like any situation where people needed to agree on the truth of something, Bayesianism wouldn’t help:
They already agree (‘scientific consensus’ and similar situations) -- Bayesianism doesn’t offer anything here.
One person disagrees, everyone else agrees—without more data no progress is possible, so either we know a decisive experiment that will resolve the issue (in which case we don’t need Bayesianism), or we don’t, in which case Bayesianism can’t help.
Lots of disagreement—similar to above but more chaotic.
Also say there is an experiment, is there any standard or agreement among bayesians about how to weight credence? (when should it be weak or strong? etc) Because if there isn’t, they might not even be able to agree on what experiment to do or if it will matter.
The FDA just decided to allow Bayesian statistics to be used to argue for drug approval. If a drug company wants approval they will need to state their priors and then how they updated based on the studies they ran. If the FDA does not agree, they might say “Well, I don’t think those priors are reasonable, please use X as priors”. I expect that the exact rules of how the process goes will evolve with time.
On another practical application space, you have superforcasting where credence calibration is important. You make progress by believing people with better Brier’s scores (or log loss). If some scientists in a field can make predictions about experimental outcomes with better Brier’s scores, than they seem to understand more about the domain and should be trusted.
If so, I think we might be talking cross-purposes. I’m asking about bayesian epistemology particularly, not statistics. (Nor about estimations based on other people’s beliefs since that doesn’t seem like an epistemology except maybe JTB with more steps)
If not, I’m not sure what your point is particularly.
Sorry for the delayed replay. My comments are being held for review.
The question “How do we know which drugs work and are beneficial to patients?” is an applied epistemology question. Looking at how it gets answered by a sophisticated system tells you how epistemology actually works in practice instead of how philosophers think it’s supposed to work in their ivory tower. If you use Bayesian statistics you want an epistemology behind that use that guides you in how you use the statistics to reason.
Superforcasting is much more about Bayesian epistemology than about Bayesian statistics. You have Superforcasters who would they they are Bayesians but can’t write down Bayes theorem.
You seem to have some idea that epistemology is supposed to be “objective”. It’s supposed to give you the answer from God’s view. A lot of Western science is build around wanting to reach God’s view. The problem is that God doesn’t exist. According to Nietzsche, he’s dying. Bayesian epistemology is an epistomology without God, which means that you have to deal with your beliefs and other peoples beliefs.
The reason to bother with Bayesianism is not because it helps you to see the world from God’s view but because it has practical utility in applied epistemology with FDA drug approval and Superforcasting being two examples.
You seem to have some idea that epistemology is supposed to be “objective”. It’s supposed to give you the answer from God’s view.
Objective in a sense, but I’m not sure how I’ve given you the ‘God’s view’ impression. I think epistemology should be objective in that it should work universally via the same rules, and that people can discuss both ideas and the world (evidence) in such a way to reach agreement in an objective sense. But they can be wrong, there’s no infallible method of getting to the truth. They can also objectively agree on each other’s subjective states.
The reason to bother with Bayesianism is not because it helps you to see the world from God’s view but because it has practical utility in applied epistemology with FDA drug approval and Superforcasting being two examples.
Putting aside superforecasting because I don’t know much about it, using bayesian statistics for statistical analysis is fine.
But that’s not what’s happening on LW. When people on LW talk about their priors and updating them, they’re not talking about bayesian statistics, they’re talking about epistemology, about what ideas are true. I think those are fundamentally different things and they work in different ways (and it seems like you do, too). I’m here because LW is the largest bayesian forum (or if not the largest it’s better than reddit for discussion, point is, it’s the best option for talking to bayesians).
The idea that we should apply bayesian statistics to epistemic tasks is what I’m interested in discussing.
If superforecasting is important to discuss, can you link me to something that represents what you mean by it?
If you use frequentist statistics you can just claim that your data follows a normal distribution (which it objectively most likely isn’t, even the archtypical example of height violates a normal distribution because there are more people with dwarfism than the normal distribution assumes). On the other hand, to use Bayesian statistics you do need to decide on priors which does involve subjective decisions.
Objective in a sense, but I’m not sure how I’ve given you the ‘God’s view’ impression. I think epistemology should be objective in that it should work universally via the same rules, and that people can discuss both ideas and the world (evidence) in such a way to reach agreement in an objective sense.
The fact that you aren’t explicitly thinking of God doesn’t mean that the idea of objectivity does not come out of a Christian scientific tradition which had as a key motivation trying to see things from God’s view. Your ideas of how you think, it should work have that theistic origin.
Bayesianism as discussed on LessWrong is about how it would be good for an agent to reason and that’s a different goal. Eliezer was interested in it because he wants to know how what’s true about how agents effectively reason to be able to say things about superintelligence.
Philip E. Tetlock separately was interested in the epistemic task about how to reason about what to believe. After the Iraq war the US military thought they had problems with epistomology as shown by the fact that they got the WMD question so horribly wrong. Out of that there was an IARPA tournament where Tetlock’s team won. Tetlock runs GJOpen which does solve epistemic problems for entities that want probability for certain events happening. He got some grant money from OpenPhil. There’s also Metaculus that comes out of rationalist sphere. His book Superforcastingis a good resource if you want to understand how the kind of Bayesianism where people don’t explicitly use statistics works in practice.
Doesn’t that mean we should expect that Bayesians often disagree and they have no way to resolve it except consulting reality (i.e., an experiment)?
Short answer: Yes.
Longer answer: Two Bayesians who start out with the same prior probabilities, and see the same evidence, should update their posterior probabilities in the same way, and so their mental models should stay consistent with each other. Two Bayesians who start out with different prior probabilities, but see the same evidence, should update their posterior probabilities in ways that are predictable to each other, and in line with the evidence—that is, if one reasoner (A)‘s prior probability that (for example) General Relativity is true was high, while another (B)’s was low, then when an experiment is run which provides evidence for general relativity, A’s estimates of General Relativity’s likelihood of being true will change less than B’s (because B’s priors were more wrong), but both will update in a direction and to an extent that is predictable to either of them. As they see more and more of the same evidence, their models of the world should converge.
This is all assuming an ideal Bayesian reasoner with practically-unlimited computing power who doesn’t cheat or decide not to reason according to Bayesian rules when it becomes inconvenient, and humans don’t meet those constraints. But, there’s math to say how much you should update given particular evidence. So:
Also say there is an experiment, is there any standard or agreement among bayesians about how to weight credence?
Yep. “How to weight credence” is a bit unclearly stated, but there’s Bayes’ formula, which tells you how to update your probabilities based on evidence, and that might be what you’re getting at?
Which is (one reason) why bother with Bayesianism at all. It’s a method of approaching consensus when working under uncertainty. It’s kind of an “agreeing to the rules of the game” situation, where “the rules” are a mathematical equation that says how probabilities must change when people are disagreeing (and “must” here carries the same level of mathematical strength as saying “2+2 must equal 4″, it’s not a thing that was decided by committee) - if for example you say it’s 95% unlikely/5% likely that something will happen under your idea of how the world works, and then it happens, if you’re playing fair, you make a big update, and if you put numbers on it, Bayes’ rule tells you what your new numbers should be. If you don’t like the new numbers, you have to either acknowledge that what you said your priors were was incorrect, or that what you said your likelihood estimates of different outcomes were was incorrect—so either you retroactively revise how you used to think the world works, or you retroactively revise what you thought would happen and how confident you were, both of which are kind of awkward and embarrassing. And the people you’re disagreeing with, if you don’t make an appropriately sized update given what you told them your priors and likelihood estimates were, can point this out as a fact. And if you’re not very confident, or you don’t think particular evidence should carry much weight, you can express that in a way that makes it clear how much you’re going to update based on whatever evidence you see, before you see it, so it isn’t like:
“I think x is definitely wrong, and y will provide strong evidence” ″OK, so y didn’t go how I thought, now I think x is only almost certain to be false”
It’s like:
“I think X is a% likely to be false, and Y experiment will turn out the way I expect with b% likelihood as a result.”
“Oh. Ok, well, I guess now I’m down to c% likelihood that X is false. Shoot.”
And instead of being like “it’s not fair that you moved from definitely to almost certain based on something you said would provide strong evidence but didn’t go the way you expected”, the reaction is “yep, that math is correct, you updated how you should have given your priors”. And before you get to that point, pre-experiement, you can argue over whether b% likelihood is reasonable, where it’s hard to argue about the correct meaning of the word “strong”.
And once you get really familiar with doing this (I’m still not great at it), you know intuitively how much putting X% probability on a particular outcome means you’re going to have to change your views if it doesn’t happen, and you become appropriately cautious (“calibrated”) in your estimates, and your saying things in probabilities conveys a lot of information to other people who are also familiar with talking this way.
All of that is in idealized theory among people who are quite smart and can do lots of calculation in their heads. Lots of people also LARP it and use Bayesian-sounding words without actually having the deeper intuitive understanding of what what they’re saying means.
Doesn’t that mean we should expect that Bayesians often disagree and they have no way to resolve it except consulting reality (i.e., an experiment)?
Short answer: Yes.
Okay.
FWIW, CF says that two people should be able to use discussion to resolve most disagreements. Some disagreements do require experiments, but only when we have multiple contradictory theories that otherwise agree. (e.g., UG and GR mostly agree so it’s reasonable that two scientists might disagree on which is correct but they should be able to agree on a decisive experiment.) When such discussions fail there are ways to deal with that, too (so you can go address the reason for failure then resume the original discussion).
[...] that is, if one reasoner (A)‘s prior probability that (for example) General Relativity is true was high, while another (B)’s was low, then when an experiment is run which provides evidence for general relativity, A’s estimates of General Relativity’s likelihood of being true will change less than B’s (because B’s priors were more wrong) [...]
Doesn’t that require some assumptions about their hypothesis spaces? You assume that B’s priors are more wrong, but we don’t know that from just saying that [1]. We need to know about all their other priors, too.
I agree the update can be predictable to one another if they specify and communicate the necessary constants / weightings / etc. re specification and communication: For something that seems important to bayesians I’m surprised that I don’t see more of that.
As they see more and more of the same evidence, their models of the world should converge.
I’m not convinced of this but I agree it seems that intuitively, most of the time at least, they should.
This is all assuming an ideal Bayesian reasoner with practically-unlimited computing power who doesn’t cheat or decide not to reason according to Bayesian rules when it becomes inconvenient.
This surprised me to read and I have some immediate questions.
When is Bayesian reasoning inconvenient?
Do you think there’s other valid ways to reason?
If so, why do we need bayesianism? Is Bayesianism incomplete?
Or if not, how would an ideal Bayesian reasoner ever be fooled into using bad/broken/misleading methods of reasoning?
Are there other more energy efficient ways of reasoning to get to the same answers reliably? Why use bayesianism at all in that case?
Is cheating ever consistent with bayesianism?
Which is (one reason) why bother with Bayesianism at all. It’s a method of approaching consensus when working under uncertainty.
Sure, but so is “do whatever Max says”. Consensus on its own has nothing to do with truth. Now, I guess you mean something a little more specific but I don’t want to put words in your mouth.
Here’s an example of how Bayesianism fails to produce consensus (or it seems like that to me):
Two aliens (A1, A2) who do not know about relativity are communicating. A1 observes an event E1 happens before E2, but A2 observes E2 before E1. They will each update in different directions.
Is there a Bayesian way to resolve this?
[...] if you’re not very confident, [...]
How does Bayesianism deal with confidence? Do all priors have errors attached? If so, how are errors handled in the maths?
Also, if the error bars are large enough, is convergence still guaranteed? Or what if our errors are off due to things like overconfidence?
[...] you can argue over whether b% likelihood is reasonable [...]
I’m not sure how this would help Bayesians, what does a successful discussion of this kind look like?
The reason I’m not sure is unanswered questions like what are they observing to cause updates? and which priors are being updated?
Sorry for the delayed replay. My comments are being held for review.
I think the missing step is that you’re updating more than you’re updating and . If we use actual numbers, let’s say the Bayesian comes in with , , and . The update based on observing Mercury should be to remove from the standing and renormalize, dividing the remaining probabilities by their sum. So your new probabilities are , , and
When new evidence comes in that falsifies NMP, P(O) jumps up to 0.5.
I don’t have a reason for setting them equal, no. The prior probabilities could be arbitrarily split between the remaining options.
Yes, that’s correct. If we were to keep experimenting and observing, we would find some data that would have essentially 0 likelihood showing up under
That last question is trickier. If there’s no new data either way, but it predicts reality better than most hypotheses in , you can split it out into and , conserving the sum so that . (Granted, if there are other hypotheses within that line up with reality, then you should split those out as well.)
Then you can compare which specific predictions makes that does not. Once you perform experiments and get data that is extremely unlikely under but likely under , then you rule out and are left with and . Any hypotheses in that inconsistent under that new data also get ruled out, effectively increasing the probability assigned to .
If we were to keep experimenting and observing, we would find some data that would have essentially 0 likelihood showing up under .
I’m not sure this is true in general. Sometimes we only figure out what data to look for to disprove theory 1 after someone comes up with theory 2. So before that, a bayesian might be gathering lots of data and updating theory 1′s prior towards P=1, but they’d be being mislead with no way to know they were being mislead. But also, where does theory 2 come from? Either it was something the bayesian could have thought up themselves (in which case why were they updating in the wrong direction?), or bayesianism is incomplete and theory 2 was generated in a non-bayesian way (and in this case it’s hard to see bayesianism as anything but inferior to this other method; at the very least the sum of both methods is definitely better than bayesianism alone).
(Granted, if there are other hypotheses within that line up with reality, then you should split those out as well.)
There are an infinite number of those, though. Putting that aside, there are other mathematical constructions distinct from GR that are in , but we don’t know what they are yet (just like GR prior to publication).
And it seems like in order to split sensibly we’d need to know the distribution of probabilities over these unknown hypotheses in the hypothesis space. If we used a rule like always split in half, then our result is dependent on which order we learn the theories in (which seems illogical).
Then you can compare which specific predictions makes that does not.
But UG_other makes all possible predictions (since it’s a group of many hypotheses). Or close to ‘all’, at least.
Any hypotheses in that inconsistent under that new data also get ruled out, effectively increasing the probability assigned to .
Oh and I think I just saw an issue with your original reasoning:
When new evidence comes in that falsifies NMP, P(O) jumps up to 0.5.
The evidence that falsified NMP also falsified some hypotheses of O (but not GR). So would not be 0.5.
Sorry for the delayed replay. My comments are being held for review.
It seems like a consistent element of everyone’s explanations so far has been that the bayesian has reasonable priors and is not naive. I’ll respond to everyone at once here (hope that’s okay) to keep things organized. cc @michaeldickens, @cole-wyeth, @gavin-runeblade, @thomas-castriensis (thanks everyone for the explanations so far).
edit: IDK how to mention/tag people :(
One thing I notice, the Bayesian needs a prior for ‘unknown scam’ but shouldn’t they also consider the inverse? Like an unknown bank error in your favor type situation. I suppose that if the prior for that is small enough it doesn’t matter so it’s ignored as insignificant. Is the significance a matter of the ratio between priors of unknowns? Like it’s way more likely to be an unknown scam than an unknown bank error, so we ignore the bank error.
Something bigger doesn’t quite sit right with me: I don’t think having priors and not being naive is our default state for knowledge frontiers (hence the initial framing and restrictions on knowledge of scams). Rather, the default is that we are missing priors (or they’re wildly inaccurate) and we are naive to the ways in which we are naive (there is a lot that we don’t know we don’t know).
Here’s a concrete example: consider a Bayesian in 1901 (before general relativity). Almost all measurements of planetary motion agreed with Newton’s universal gravity (UG) and Mercury’s orbit was our main (only?) counterexample. We had multiple (incorrect) theories about how Mercury’s orbit might be perturbed by this body or that. The most recently discovered planet at the time (Neptune) was predicted via UG (0.125 of all known planets).
My understanding is that the Bayesian will, upon seeing measurements of all the planets’ orbits, update based on successful predictions. Most of the predictions are correct for UG alone, and all of them are correct for UG + New Mystery Planet. Let’s call these hypotheses and and we also have for all other UG based theories.
Now, based on the reasoning used (if I understand it correctly), the Bayesian already has reasonable priors (at least they believe so), and the general relativity hypothesis (playing the role of the scam hypothesis) is already accounted for in the Bayesian’s calculations and updates. It doesn’t matter that the Bayesian is unaware of how it works or not. [Unsure] The probability of GR being true is independent of whether the Bayesian knows about it or not [1] . Let’s call all the other hypotheses (excluding any UG hypotheses) just .
Okay so in case of observing something uncontroversial, like Mars data, , , and since lots of UG+something theories are inconsistent, similarly for the same reason (there are many (infinite) hypotheses in that are inconsistent and infinitely many hypotheses that are consistent). So we update in favor of and and decrease and .
Then, when observing Mercury, we see , , , and . So we update to increase , and decrease , and (which takes the biggest hit).
Have I gone wrong somewhere? I’m not confident that I’m at the right midstate here. If there isn’t a problem with my logic then this result feels intuitively wrong because we’re getting more confident in and no amount of observing will allow (containing GR) to catch up.
Also it seems like pretty quickly decouples from the probability of the next planet we find being via prediction (like Neptune), so we can ignore it after a bit when updating .
that’s how it seems it must be to me. I can’t quite put my finger on it, but it feels wrong that learning about an idea independently of all observations of data somehow updates things retroactively.
Bayesianism has no rules for what someone priors should be. It has rules about how to progress from a state of having priors.
There’s a reason that Einstein did not get his Nobel Prize for the special theory of relativity. At the time the prize was given, the Nobel Prize committee did not believe that the predictions about Mercuries orbit were strong enough evidence for the special theory of relativity to give him the Nobel Prize for it. Besides Mercuries orbit there was also the Michelson-Morley experiment.
To the extend that you were taught in school that the Michelson-Morley and Mercuries orbit provided definite evidence for the special theory of relativity, that’s a retrospective accounting from people who already knew it to be true and not the perspective from the physicists that gave Einstein his Nobel Prize.
Whether or not physicists at the time should have updated more strongly into the direction of seeing special relativity as proven depends a lot about what you believe of the merits of alternative explanations for the observations and how well those fitted the data and how likely you consider the measurement for Mercuries orbit to be correct.
In Bayesanism probabilities are not independent of the model of the observer the same way that frequentism has a notion of observer independent probabilities.
The general mechanism with the mail order scam you talked about is called survivorship bias. If you take the question of how high the existential risk of being nuclear war happens to be, this matters. If you just observe that we now have nuclear weapons for a long time, it might be wrong to update with each passing year into a lower chance of nuclear war because you would not be around to observe reality in case everyone got killed by nuclear war. That’s why we need to look at Petrov and Arkhipov to get an understanding about near misses and we treat both of them as heroes on LessWrong.
Sure, but that’s not exactly what I’m asking about.
From your perspective, you have a prior
P(bayesianism). I’ll call thisP(B)for short. There’s also some priorP(not bayesianism)that covers all other epistemologies, both those which you know about and those you do not. DoesP(B)change:when you learn of other epistemologies? (I think this breaks up a bayesian’s hypothesis space but there’s no new data)
when you learn about other epistemologies? (Like details about how they work)
when you learn that you had an obstructive misunderstanding about other epistemologies? (Like, you thought it worked like X but actually it was Y)
other events?
You can replace P(B) with P(GR) if you like and ‘epistemologies’ with ‘theories of gravity’. The underlying thing I’m unsure about is where the update to the prior happens and why. If it is when you learn about the new theory, then it seems like an admission that the P(other) option is just outright wrong, so working backwards I guess bayesianism stays consistent by not updating the prior in response to learning a new idea.
Sorry for the delayed replay. My comments are being held for review.
I think you need to make a clearer distinction between a hypothetical perfect bayesian reasoner, who would know all possible hypotheses from the beginning and only narrow them down through observation, and the things humans do to try to approximate that, which will sometimes involve going back and changing the prior when a new hypothesis has been thought of.
Keep in mind that these “probabilities” are subjective assessments of probability based on an individual’s prior knowledge, not facts about reality. Two Bayesians with different prior experience may disagree about how probable something is (/seems to them), but reality will not disagree or debate with itself about the truth of the matter, or assign probability to different possibilities (mumble mumble I don’t really understand quantum mechanics and am pretending it doesn’t matter for the purpose of this conversation).
Whether or not General Relativity is true is unaffected by any probabilities any Bayesian may put on its truth or falsehood when reasoning about the evidence they’ve seen so far. But whether or not a particular Bayesian finds General Relativity to be probably true, is definitely affected by whether they know about it or not. Keep a clear distinction in your mind between “the probability a Bayesian reasoner assigns to some fact being true” and “whether or not that fact is true in reality”—these are not the same.
This is a “map vs. territory” distinction. Bayesian probabilities go into a mental model of how the world works (map), while how the world actually works is separate from the map.
Doesn’t that mean we should expect that Bayesians often disagree and they have no way to resolve it except consulting reality (i.e., an experiment)?
If that’s the case, why bother with Bayesianism at all? It seems like any situation where people needed to agree on the truth of something, Bayesianism wouldn’t help:
They already agree (‘scientific consensus’ and similar situations) -- Bayesianism doesn’t offer anything here.
One person disagrees, everyone else agrees—without more data no progress is possible, so either we know a decisive experiment that will resolve the issue (in which case we don’t need Bayesianism), or we don’t, in which case Bayesianism can’t help.
Lots of disagreement—similar to above but more chaotic.
Also say there is an experiment, is there any standard or agreement among bayesians about how to weight credence? (when should it be weak or strong? etc) Because if there isn’t, they might not even be able to agree on what experiment to do or if it will matter.
The FDA just decided to allow Bayesian statistics to be used to argue for drug approval. If a drug company wants approval they will need to state their priors and then how they updated based on the studies they ran. If the FDA does not agree, they might say “Well, I don’t think those priors are reasonable, please use X as priors”. I expect that the exact rules of how the process goes will evolve with time.
On another practical application space, you have superforcasting where credence calibration is important. You make progress by believing people with better Brier’s scores (or log loss). If some scientists in a field can make predictions about experimental outcomes with better Brier’s scores, than they seem to understand more about the domain and should be trusted.
Are you responding to this? (which I said above)
If so, I think we might be talking cross-purposes. I’m asking about bayesian epistemology particularly, not statistics. (Nor about estimations based on other people’s beliefs since that doesn’t seem like an epistemology except maybe JTB with more steps)
If not, I’m not sure what your point is particularly.
Sorry for the delayed replay. My comments are being held for review.
The question “How do we know which drugs work and are beneficial to patients?” is an applied epistemology question. Looking at how it gets answered by a sophisticated system tells you how epistemology actually works in practice instead of how philosophers think it’s supposed to work in their ivory tower. If you use Bayesian statistics you want an epistemology behind that use that guides you in how you use the statistics to reason.
Superforcasting is much more about Bayesian epistemology than about Bayesian statistics. You have Superforcasters who would they they are Bayesians but can’t write down Bayes theorem.
You seem to have some idea that epistemology is supposed to be “objective”. It’s supposed to give you the answer from God’s view. A lot of Western science is build around wanting to reach God’s view. The problem is that God doesn’t exist. According to Nietzsche, he’s dying. Bayesian epistemology is an epistomology without God, which means that you have to deal with your beliefs and other peoples beliefs.
The reason to bother with Bayesianism is not because it helps you to see the world from God’s view but because it has practical utility in applied epistemology with FDA drug approval and Superforcasting being two examples.
Objective in a sense, but I’m not sure how I’ve given you the ‘God’s view’ impression. I think epistemology should be objective in that it should work universally via the same rules, and that people can discuss both ideas and the world (evidence) in such a way to reach agreement in an objective sense. But they can be wrong, there’s no infallible method of getting to the truth. They can also objectively agree on each other’s subjective states.
Putting aside superforecasting because I don’t know much about it, using bayesian statistics for statistical analysis is fine.
But that’s not what’s happening on LW. When people on LW talk about their priors and updating them, they’re not talking about bayesian statistics, they’re talking about epistemology, about what ideas are true. I think those are fundamentally different things and they work in different ways (and it seems like you do, too). I’m here because LW is the largest bayesian forum (or if not the largest it’s better than reddit for discussion, point is, it’s the best option for talking to bayesians).
The idea that we should apply bayesian statistics to epistemic tasks is what I’m interested in discussing.
If superforecasting is important to discuss, can you link me to something that represents what you mean by it?
If you use frequentist statistics you can just claim that your data follows a normal distribution (which it objectively most likely isn’t, even the archtypical example of height violates a normal distribution because there are more people with dwarfism than the normal distribution assumes). On the other hand, to use Bayesian statistics you do need to decide on priors which does involve subjective decisions.
The fact that you aren’t explicitly thinking of God doesn’t mean that the idea of objectivity does not come out of a Christian scientific tradition which had as a key motivation trying to see things from God’s view. Your ideas of how you think, it should work have that theistic origin.
Bayesianism as discussed on LessWrong is about how it would be good for an agent to reason and that’s a different goal. Eliezer was interested in it because he wants to know how what’s true about how agents effectively reason to be able to say things about superintelligence.
Philip E. Tetlock separately was interested in the epistemic task about how to reason about what to believe. After the Iraq war the US military thought they had problems with epistomology as shown by the fact that they got the WMD question so horribly wrong. Out of that there was an IARPA tournament where Tetlock’s team won. Tetlock runs GJOpen which does solve epistemic problems for entities that want probability for certain events happening. He got some grant money from OpenPhil. There’s also Metaculus that comes out of rationalist sphere. His book Superforcasting is a good resource if you want to understand how the kind of Bayesianism where people don’t explicitly use statistics works in practice.
I’m not sure what the best shorter source is but maybe https://www.gjopen.com/training/
Short answer: Yes.
Longer answer: Two Bayesians who start out with the same prior probabilities, and see the same evidence, should update their posterior probabilities in the same way, and so their mental models should stay consistent with each other. Two Bayesians who start out with different prior probabilities, but see the same evidence, should update their posterior probabilities in ways that are predictable to each other, and in line with the evidence—that is, if one reasoner (A)‘s prior probability that (for example) General Relativity is true was high, while another (B)’s was low, then when an experiment is run which provides evidence for general relativity, A’s estimates of General Relativity’s likelihood of being true will change less than B’s (because B’s priors were more wrong), but both will update in a direction and to an extent that is predictable to either of them. As they see more and more of the same evidence, their models of the world should converge.
This is all assuming an ideal Bayesian reasoner with practically-unlimited computing power who doesn’t cheat or decide not to reason according to Bayesian rules when it becomes inconvenient, and humans don’t meet those constraints. But, there’s math to say how much you should update given particular evidence. So:
Yep. “How to weight credence” is a bit unclearly stated, but there’s Bayes’ formula, which tells you how to update your probabilities based on evidence, and that might be what you’re getting at?
Which is (one reason) why bother with Bayesianism at all. It’s a method of approaching consensus when working under uncertainty. It’s kind of an “agreeing to the rules of the game” situation, where “the rules” are a mathematical equation that says how probabilities must change when people are disagreeing (and “must” here carries the same level of mathematical strength as saying “2+2 must equal 4″, it’s not a thing that was decided by committee) - if for example you say it’s 95% unlikely/5% likely that something will happen under your idea of how the world works, and then it happens, if you’re playing fair, you make a big update, and if you put numbers on it, Bayes’ rule tells you what your new numbers should be. If you don’t like the new numbers, you have to either acknowledge that what you said your priors were was incorrect, or that what you said your likelihood estimates of different outcomes were was incorrect—so either you retroactively revise how you used to think the world works, or you retroactively revise what you thought would happen and how confident you were, both of which are kind of awkward and embarrassing. And the people you’re disagreeing with, if you don’t make an appropriately sized update given what you told them your priors and likelihood estimates were, can point this out as a fact. And if you’re not very confident, or you don’t think particular evidence should carry much weight, you can express that in a way that makes it clear how much you’re going to update based on whatever evidence you see, before you see it, so it isn’t like:
“I think x is definitely wrong, and y will provide strong evidence”
″OK, so y didn’t go how I thought, now I think x is only almost certain to be false”
It’s like:
“I think X is a% likely to be false, and Y experiment will turn out the way I expect with b% likelihood as a result.”
“Oh. Ok, well, I guess now I’m down to c% likelihood that X is false. Shoot.”
And instead of being like “it’s not fair that you moved from definitely to almost certain based on something you said would provide strong evidence but didn’t go the way you expected”, the reaction is “yep, that math is correct, you updated how you should have given your priors”. And before you get to that point, pre-experiement, you can argue over whether b% likelihood is reasonable, where it’s hard to argue about the correct meaning of the word “strong”.
And once you get really familiar with doing this (I’m still not great at it), you know intuitively how much putting X% probability on a particular outcome means you’re going to have to change your views if it doesn’t happen, and you become appropriately cautious (“calibrated”) in your estimates, and your saying things in probabilities conveys a lot of information to other people who are also familiar with talking this way.
All of that is in idealized theory among people who are quite smart and can do lots of calculation in their heads. Lots of people also LARP it and use Bayesian-sounding words without actually having the deeper intuitive understanding of what what they’re saying means.
Okay.
FWIW, CF says that two people should be able to use discussion to resolve most disagreements. Some disagreements do require experiments, but only when we have multiple contradictory theories that otherwise agree. (e.g., UG and GR mostly agree so it’s reasonable that two scientists might disagree on which is correct but they should be able to agree on a decisive experiment.) When such discussions fail there are ways to deal with that, too (so you can go address the reason for failure then resume the original discussion).
Doesn’t that require some assumptions about their hypothesis spaces? You assume that B’s priors are more wrong, but we don’t know that from just saying that
[1]
. We need to know about all their other priors, too.
I agree the update can be predictable to one another if they specify and communicate the necessary constants / weightings / etc. re specification and communication: For something that seems important to bayesians I’m surprised that I don’t see more of that.
I’m not convinced of this but I agree it seems that intuitively, most of the time at least, they should.
This surprised me to read and I have some immediate questions.
When is Bayesian reasoning inconvenient?
Do you think there’s other valid ways to reason?
If so, why do we need bayesianism? Is Bayesianism incomplete?
Or if not, how would an ideal Bayesian reasoner ever be fooled into using bad/broken/misleading methods of reasoning?
Are there other more energy efficient ways of reasoning to get to the same answers reliably? Why use bayesianism at all in that case?
Is cheating ever consistent with bayesianism?
Sure, but so is “do whatever Max says”. Consensus on its own has nothing to do with truth. Now, I guess you mean something a little more specific but I don’t want to put words in your mouth.
Here’s an example of how Bayesianism fails to produce consensus (or it seems like that to me):
Two aliens (A1, A2) who do not know about relativity are communicating. A1 observes an event E1 happens before E2, but A2 observes E2 before E1. They will each update in different directions.
Is there a Bayesian way to resolve this?
How does Bayesianism deal with confidence? Do all priors have errors attached? If so, how are errors handled in the maths?
Also, if the error bars are large enough, is convergence still guaranteed? Or what if our errors are off due to things like overconfidence?
I’m not sure how this would help Bayesians, what does a successful discussion of this kind look like?
The reason I’m not sure is unanswered questions like what are they observing to cause updates? and which priors are being updated?
Sorry for the delayed replay. My comments are being held for review.
even if we assume GR is absolutely 100% correct, and IRL we know that GR isn’t 100% correct.
I think the missing step is that you’re updating more than you’re updating and . If we use actual numbers, let’s say the Bayesian comes in with , , and . The update based on observing Mercury should be to remove from the standing and renormalize, dividing the remaining probabilities by their sum. So your new probabilities are , , and
When new evidence comes in that falsifies NMP, P(O) jumps up to 0.5.
Okay I see, thanks.
Is that because we only have and left, and they both started equal? Did you have a reason for setting ?
Also, I guess the is removed because we have right? (Though IRL I understand it’s bayesian convention to choose a very small value rather than 0)
Also, do you know how things change after GR is published? (assuming no new data in the mean time)
I don’t have a reason for setting them equal, no. The prior probabilities could be arbitrarily split between the remaining options.
Yes, that’s correct. If we were to keep experimenting and observing, we would find some data that would have essentially 0 likelihood showing up under
That last question is trickier. If there’s no new data either way, but it predicts reality better than most hypotheses in , you can split it out into and , conserving the sum so that . (Granted, if there are other hypotheses within that line up with reality, then you should split those out as well.)
Then you can compare which specific predictions makes that does not. Once you perform experiments and get data that is extremely unlikely under but likely under , then you rule out and are left with and . Any hypotheses in that inconsistent under that new data also get ruled out, effectively increasing the probability assigned to .
I’m not sure this is true in general. Sometimes we only figure out what data to look for to disprove theory 1 after someone comes up with theory 2. So before that, a bayesian might be gathering lots of data and updating theory 1′s prior towards P=1, but they’d be being mislead with no way to know they were being mislead. But also, where does theory 2 come from? Either it was something the bayesian could have thought up themselves (in which case why were they updating in the wrong direction?), or bayesianism is incomplete and theory 2 was generated in a non-bayesian way (and in this case it’s hard to see bayesianism as anything but inferior to this other method; at the very least the sum of both methods is definitely better than bayesianism alone).
There are an infinite number of those, though. Putting that aside, there are other mathematical constructions distinct from GR that are in , but we don’t know what they are yet (just like GR prior to publication).
And it seems like in order to split sensibly we’d need to know the distribution of probabilities over these unknown hypotheses in the hypothesis space. If we used a rule like always split in half, then our result is dependent on which order we learn the theories in (which seems illogical).
But UG_other makes all possible predictions (since it’s a group of many hypotheses). Or close to ‘all’, at least.
Oh and I think I just saw an issue with your original reasoning:
The evidence that falsified NMP also falsified some hypotheses of O (but not GR). So would not be 0.5.
Sorry for the delayed replay. My comments are being held for review.