I think there is (sometimes) value in distinguishing two separate probabilities for any given thing. There’s the “naïve” probability that you estimate while ignoring the possibility that you’ve blundered, that you misread something critical, that some underlying assumption of yours is wrong in a way that never crossed your mind, etc. And then there’s the “pessimistic” probability that tries to account for those things.
You want these to be separate because if you’re doing a calculation using the various probabilities, sometimes it’s better to do all the calculations using “naïve” probabilities and then do a final correction at the end for blunders, wrong fundamental assumptions, etc.
… Maybe. It depends on what the calculation is, what sort of out-of-model errors there might be, etc.
Of course this is a rough heuristic. I think what it’s an approximation to is a more careful tracking of lots of conditional probabilities (people around here sometimes talk as if being a Bayesian means assigning probabilities to things, but it would be more precise to say that being a Bayesian means assigning conditional probabilities to things, and a lot of the information is in that extra structure). E.g., suppose there are 100 things, each of which you give naïve probability 10^-9 to, but there’s a 10^-3 chance that some fundamental error in your model makes them actually happen 1⁄10 of the time. Then your “adjusted” probability for each one is about 10^-4, and if you use those to estimate the probability that at least one happens you get about 10^-2; but in this situation—assuming that the “fundamental error in your model” is actually the only substantial cause of out-of-model errors—that probability should actually be more like 10^-4. Of course, if you make a calculation like that then sometimes there’s a fundamental error in your model of where the possible errors come from :-).
Hmm, my point though is that you’re mistaken if you think you can separate these two, because you’re the embedded agent making both predictions, so your naive prediction isn’t actually independent of you the faliable being making predictions.
I’d compare this to the concept of significant digits in science. Like, yeah, you can get highly accurate measurements, but as soon as you stick them in calculations they get eaten up by the error in other measurements. I’m claiming the same thing happens here for humans: beyond a certain point our predictions are dominated by our own errors. Maybe my particular numbers are not representative of all scenarios, but I think the point stands regardless, you just have to dial in the numbers to match reality.
I completely agree that beyond a certain point our predictions are dominated by our own errors, but I’m not sure that that’s always well modelled by just moving all probability estimates that are close to 0 or 1 away by (say) 10^-3.
Example: Pascal’s mugging. (This is an example where just moving everything away from 0 or 1 is probably a bad idea, but to be clear I think it isn’t an example where it would help much to separate out your “in-model” and “out-of-model” errors.) Someone comes to you and says: I am a god/wizard/simulation-operator and can do such-and-such things which produce/destroy incredibly large amounts of utility; pay me $1000 and I’ll do that in your favour rather than against you. You say: haha, no, my estimate of the probability that you can swing 3^^^3 utils is less than 1/3^^^3, so go jump in a lake.
In this situation, if instead you say “gosh, I could be wrong in all sorts of ways, so I’d better revise that probability estimate to say 10^-6” and then go ahead and do your expected-utility calculation, then you pay the mugger every time. Even after they say “behold, now I shall create you a mountain of gold just to prove I can” and nothing happens and they say “ah, well, I’m just testing your faith; would you like to give me another $1000 now?”.
Perhaps the right way to handle this is to say that utility 1/epsilon is no better than probability epsilon, embrace scope insensitivity, and pretend that they were only offering/threatening 10^6 utils and your probability is only 10^-6, or something like that. And maybe that says that when someone makes such a claim you should give them a dollar if that’s what they ask for, and see whether they deliver.
I am not at all confident that that’s really a good approach, but if you do handle it that way then you need to be able to reason that after you give them a dollar and they fail, you shouldn’t give them another dollar because however improbable their claim was before, it’s 100x more improbable now. You can’t do that if you just mechanically turn all very small probabilities into 10^-6 or whatever.
I don’t have a clearly-satisfactory approach to offer instead. But I think this sort of example demonstrates that sometimes you need to do something more sophisticated than pushing all tiny probabilities away from zero.
I guess an instrumental approach I’ve been advocating on this site for a long time is to estimate the noise level, call it “practically zero” and treat anything at that level as such. For example, in the Pascal’s mugger case, there are so many disjunctive possibilities with higher odds to hear the same story vs the story as told being true, that there is no reason to privilege believing in what you hear over all higher-probability options, including dreaming, hallucinating, con, psych experiment, candid camera… It’s not about accurately estimating EV and so becoming susceptible to blackmail, it’s about rejecting anything at the noise level. Which, I guess, is another way to say “epsilon”, not technically zero, but as good as.
You can at least estimate some lower bounds on self-error, even if you can’t necessarily be certain of upper ones. That’s better than nothing, which is what you get if you don’t separate the probabilities.
For example my performance in test questions where I know the subject backwards and forwards isn’t 100%, because sometimes I misread the question, or have a brain fart while working out answers, and so on. On the other hand, most of these are localized errors. Given extra time, opportunity to check references, consult with other people, and so on, I can reduce these sorts of errors a great deal.
I think there is (sometimes) value in distinguishing two separate probabilities for any given thing. There’s the “naïve” probability that you estimate while ignoring the possibility that you’ve blundered, that you misread something critical, that some underlying assumption of yours is wrong in a way that never crossed your mind, etc. And then there’s the “pessimistic” probability that tries to account for those things.
You want these to be separate because if you’re doing a calculation using the various probabilities, sometimes it’s better to do all the calculations using “naïve” probabilities and then do a final correction at the end for blunders, wrong fundamental assumptions, etc.
… Maybe. It depends on what the calculation is, what sort of out-of-model errors there might be, etc.
Of course this is a rough heuristic. I think what it’s an approximation to is a more careful tracking of lots of conditional probabilities (people around here sometimes talk as if being a Bayesian means assigning probabilities to things, but it would be more precise to say that being a Bayesian means assigning conditional probabilities to things, and a lot of the information is in that extra structure). E.g., suppose there are 100 things, each of which you give naïve probability 10^-9 to, but there’s a 10^-3 chance that some fundamental error in your model makes them actually happen 1⁄10 of the time. Then your “adjusted” probability for each one is about 10^-4, and if you use those to estimate the probability that at least one happens you get about 10^-2; but in this situation—assuming that the “fundamental error in your model” is actually the only substantial cause of out-of-model errors—that probability should actually be more like 10^-4. Of course, if you make a calculation like that then sometimes there’s a fundamental error in your model of where the possible errors come from :-).
Hmm, my point though is that you’re mistaken if you think you can separate these two, because you’re the embedded agent making both predictions, so your naive prediction isn’t actually independent of you the faliable being making predictions.
I’d compare this to the concept of significant digits in science. Like, yeah, you can get highly accurate measurements, but as soon as you stick them in calculations they get eaten up by the error in other measurements. I’m claiming the same thing happens here for humans: beyond a certain point our predictions are dominated by our own errors. Maybe my particular numbers are not representative of all scenarios, but I think the point stands regardless, you just have to dial in the numbers to match reality.
I completely agree that beyond a certain point our predictions are dominated by our own errors, but I’m not sure that that’s always well modelled by just moving all probability estimates that are close to 0 or 1 away by (say) 10^-3.
Example: Pascal’s mugging. (This is an example where just moving everything away from 0 or 1 is probably a bad idea, but to be clear I think it isn’t an example where it would help much to separate out your “in-model” and “out-of-model” errors.) Someone comes to you and says: I am a god/wizard/simulation-operator and can do such-and-such things which produce/destroy incredibly large amounts of utility; pay me $1000 and I’ll do that in your favour rather than against you. You say: haha, no, my estimate of the probability that you can swing 3^^^3 utils is less than 1/3^^^3, so go jump in a lake.
In this situation, if instead you say “gosh, I could be wrong in all sorts of ways, so I’d better revise that probability estimate to say 10^-6” and then go ahead and do your expected-utility calculation, then you pay the mugger every time. Even after they say “behold, now I shall create you a mountain of gold just to prove I can” and nothing happens and they say “ah, well, I’m just testing your faith; would you like to give me another $1000 now?”.
Perhaps the right way to handle this is to say that utility 1/epsilon is no better than probability epsilon, embrace scope insensitivity, and pretend that they were only offering/threatening 10^6 utils and your probability is only 10^-6, or something like that. And maybe that says that when someone makes such a claim you should give them a dollar if that’s what they ask for, and see whether they deliver.
I am not at all confident that that’s really a good approach, but if you do handle it that way then you need to be able to reason that after you give them a dollar and they fail, you shouldn’t give them another dollar because however improbable their claim was before, it’s 100x more improbable now. You can’t do that if you just mechanically turn all very small probabilities into 10^-6 or whatever.
I don’t have a clearly-satisfactory approach to offer instead. But I think this sort of example demonstrates that sometimes you need to do something more sophisticated than pushing all tiny probabilities away from zero.
I guess an instrumental approach I’ve been advocating on this site for a long time is to estimate the noise level, call it “practically zero” and treat anything at that level as such. For example, in the Pascal’s mugger case, there are so many disjunctive possibilities with higher odds to hear the same story vs the story as told being true, that there is no reason to privilege believing in what you hear over all higher-probability options, including dreaming, hallucinating, con, psych experiment, candid camera… It’s not about accurately estimating EV and so becoming susceptible to blackmail, it’s about rejecting anything at the noise level. Which, I guess, is another way to say “epsilon”, not technically zero, but as good as.
You can at least estimate some lower bounds on self-error, even if you can’t necessarily be certain of upper ones. That’s better than nothing, which is what you get if you don’t separate the probabilities.
For example my performance in test questions where I know the subject backwards and forwards isn’t 100%, because sometimes I misread the question, or have a brain fart while working out answers, and so on. On the other hand, most of these are localized errors. Given extra time, opportunity to check references, consult with other people, and so on, I can reduce these sorts of errors a great deal.
There is value in knowing this.