Tangent thread: What sophisticated idea are you holding on to that you are sure has been formalized somewhere but haven’t been able to find?
I’ll go first: When called to explain and defend my ethics I explained I believe in “Karma, NO not the that BS mysticism Karma, but plain old actions have consequences in our very connected world kind of Karma.” If you treat people in a manner of honesty and integrity in all things, you will create a community of cooperation. The world is strongly interconnected and strongly adaptable so the benefits will continue outside your normal community, or if you frequently change communities. The lynchpin assumption of these beliefs is that if I create One Unit of Happiness for others, it will self propagate, grow and reflect, returning me more that One Unit of Happiness over the course of my lifetime. The same applies for One Unit of Misery.
I’ve only briefly studied ethics and philosophy, can someone better read point my to the above in formal context.
This seems like a good place to ask about something that I’m intensely curious about but haven’t yet seen discussed formally. I’ve wanted to ask about it before, but I figured it’s probably an obvious and well-discussed subject that I just haven’t gotten to yet. (I only know the very basics of Bayesian thinking, I haven’t read more than about 1⁄5 of the sequences so far, and I don’t yet know calculus or advanced math of any type. So there are an awful lot of well-discussed LW-type subjects that I haven’t gotten to yet.)
I’ve long conceived of Bayesian belief statements in the following (somewhat fuzzily conceived) way: Imagine a graph where the x-axis represents our probability estimate for a given statement being true and the y-axis represents our certainty that our probability estimate is correct. So if, for example, we estimate a probability of .6 for a given statement to be true but we’re only mildly certain of that estimate, then our belief graph would probably look like a shallow bell curve centered on the .6 mark of the x-axis. If we were much more certain of our estimate then the bell curve would be much steeper.
I usually think of the height of the curve at any given point as representing how likely I think it is that I’ll discover evidence that will change my belief. So for a low bell curve centered on .6, I think of that as meaning that I’d currently assign the belief a probability of around .6 but I also consider it likely that I’ll discover evidence (if I look for it) that can change my opinion significantly in any direction.
I’ve found this way of thinking to be quite useful. Is this a well-known concept? What is it called and where can I find out more about it? Or is there something wrong with it?
Imagine a graph where the x-axis represents our probability estimate for a given statement being true and the y-axis represents our certainty that our probability estimate is correct. So if, for example, we estimate a probability of .6 for a given statement to be true but we’re only mildly certain of that estimate, then our belief graph would probably look like a shallow bell curve
I don’t understand where the bell curve is coming from. If you have one probability estimate for a given statement with some certainty about it, you would depict it as a single point on your graph.
The bell curves in this context usually represent probability distributions. The width of that probability distribution reflects your uncertainty. If you’re certain, the distribution is narrow and looks like a spike at the estimate value. If you’re uncertain, the distribution is flat(ter). Probability distributions have to sum to 1 under the curve, so the smaller the width of the distribution, the higher the spike is.
How likely you are to discover new evidence is neither here nor there. Even if you are very uncertain of your estimate, this does not convert into the probability of finding new evidence.
I think you’re referring to the type of statement that can have many values. Something like “how long will it take for AGI to be developed?”. My impression (correct me if I’m wrong) is that this is what’s normally graphed with a probability distribution. Each possible value is assigned a probability, and the result is usually more or less a bell curve with the width of the curve representing your certainty.
I’m referring to a very basic T/F statement. On a normal probability distribution graph that would indeed be represented as a single point—the probability you’d assign to it being true. But we’re often not so confident in our assessment of the probability we’ve assigned, and that confidence is what I was trying to represent with the y-axis.
An example might be, “will AGI be developed within 30 years”? There’s no range of values here, so on a normal probability distribution graph you’d simply assign a probability and that’s it. But there’s a very big difference between saying “I really have not the slightest clue, but if I really must assign it a probability than I’d give it maybe 50%” vs. “I’ve researched the subject for years and I’m confident in my assessment that there’s a 50% probability”.
In my scheme, what I’m really discussing is the probability distribution of probability estimates for a given statement. So for the 30-year AGI question, what’s the probability that you’d consider a 10% probability estimate to be reasonable? What about a 90% estimate? The probability that you’d assign to each probability estimate is depicted as a single point on the graph and the result is usually more or less a bell curve.
How likely you are to discover new evidence is neither here nor there. Even if you are very uncertain of your estimate, this does not convert into the probability of finding new evidence.
You’re probably correct about this. But I’ve found the concept of the kind of graph I’ve been describing to be intuitively useful, and saying that it represents the probability of finding new evidence was just my attempt at understanding what such a graph would actually mean.
In my scheme, what I’m really discussing is the probability distribution of probability estimates for a given statement.
OK, let’s rephrase it in the terms of Bayesian hierarchical models. You have a model of event X happening in the future which says that the probability of that event is Y%. Y is a parameter of your model. What you are doing is giving a probability distribution for a parameter of your model (in the general case this distribution can be conditional, which makes it a meta-model, so hierarchical). That’s fine, you can do this. In this context the width of the distribution reflects how precise your estimate of the lower-level model parameter is.
The only thing is that for unique events (“will AGI be developed within 30 years”) your hierarchical model is not falsifiable. You will get a single realization (the event will either happen or it will not), but you will never get information on the “true” value of your model parameter Y. You will get a single update of your prior to a posterior and that’s it.
I think that is what I had in mind, but it sounds from the way you’re saying it that this hasn’t been discussed as a specific technique for visualizing belief probabilities.
That surprises me since I’ve found it to be very useful, at least for intuitively getting a handle on my confidence in my own beliefs. When dealing with the question of what probability to assign to belief X, I don’t just give it a single probability estimate, and I don’t even give it a probability estimate with the qualifier that my confidence in that probability is low/moderate/high. Rather I visualize a graph with (usually) a bell curve peaking at the probability estimate I’d assign and whose width represents my certainty in that estimate. To me that’s a lot more nuanced than just saying “50% with low confidence”. It has also helped me to communicate to others what my views are for a given belief. I’d also suspect that you can do a lot of interesting things by mathematically manipulating and combining such graphs.
One problem is that it’s turtles all the way down.
What’s your confidence in your confidence probability estimate? You can represent that as another probability distribution (or another model, or a set of models). Rinse and repeat.
Another problem is that it’s hard to get reasonable estimates for all the curves that you want to mathematically manipulate. Of course you can wave hands and say that a particular curve exactly represents your beliefs and no one can say it ain’t so, but fake precision isn’t exactly useful.
I’m referring to a very basic T/F statement. On a normal probability distribution graph that would indeed be represented as a single point—the probability you’d assign to it being true. But we’re often not so confident in our assessment of the probability we’ve assigned, and that confidence is what I was trying to represent with the y-axis.
Taken literally, the concept of “confidence in a probability” is incoherent. You are probably confusing it with one of several related concepts. Lumifer has described one example of such a concept.
Another concept is how much you think your probability estimate will change as you encounter new evidence. For example, your estimate for whether the outcome of the coin flip for the 2050 Superbowl will be heads is 1⁄2, and you are unlikely to encounter evidence that changes it (until 2050 that is). On the other hand, your estimate for the probability AI being developed by 2050 is likely to change a lot as you encounter more evidence.
It wouldn’t be the first time a sport has gone from vastly popular to mostly forgotten within 40 years. Jai alai was the particular example I had in mind; it was once incredibly popular, but quickly descended to the point where it’s basically entirely forgotten.
Taken literally, the concept of “confidence in a probability” is incoherent.
Why? I thought the way Lumifer expressed it in terms of Bayesian hierarchical models was pretty coherent. It might be turtles all the way down as he says, and it might be hard to use it in a rigorous mathematical way, but at least it’s coherent. (And useful, in my experience.)
Another concept is how much you think your probability estimate will change as you encounter new evidence.
This is pretty much what I meant in my original post by writing:
I usually think of the height of the curve at any given point as representing how likely I think it is that I’ll discover evidence that will change my belief. So for a low bell curve centered on .6, I think of that as meaning that I’d currently assign the belief a probability of around .6 but I also consider it likely that I’ll discover evidence (if I look for it) that can change my opinion significantly in any direction.
But expressing it in terms of how likely my beliefs are to change given more evidence is probably better. Or to say it in yet another way: how strong new evidence would need to be for me to change my estimate.
It seems like the scheme I’ve been proposing here is not a common one. So how do people usually express the obvious difference between a probability estimate of 50% for a coin flip (unlikely to change with more evidence) vs. a probability estimate of 50% for AI being developed by 2050 (very likely to change with more evidence)?
I believe you may be confusing the “map of the map” for the “map”.
If I understand correctly, you want to represent your beliefs about a simple yes/no statement. If that is correct, the appropriate distribution for your prior is Bernoulli. For a Bernoulli distribution, the X axis only has two possible values: True or False. The Bernoulli distribution will be your “map”. It is fully described by the parameter “p”
If you want to represent your uncertainty about your uncertainty, you can place a hyperprior on p. This is your “map of the map”. Generally, people will use a beta distribution for this (rather than a bell-shaped normal distribution). With such a hyperprior, p is on the X-axis and ranges from 0 to 1.
I am slightly confused about this part, but it is not clear to me that we gain much from having a “map of the map” in this situation, because no matter how uncertain you are about your beliefs, the hyperprior will imply a single expected value for p
I believe you may be confusing the “map of the map” for the “map”.
If I understand correctly, you want to represent your beliefs about a simple yes/no statement. If that is correct, the appropriate distribution for your prior is Bernoulli. For a Bernoulli distribution, the X axis only has two values: True or False. The Bernoulli distribution will be your “map”. It is fully described by the parameter “p”
If you want to represent your uncertainty about your uncertainty, you can place a hyperprior on p. This is your “map of the map”. Generally, people will use a beta distribution for this (rather than a bell-shaped normal distribution). With such a hyperprior, p is on the X-axis and ranges from 0 to 1.
I am slightly confused about this part, but it is not clear to me that we gain much from having a “map of the map” in this situation, because no matter how uncertain you are about your beliefs, the hyperprior will imply a single expected value for p.
What sophisticated idea are you holding on to that you are sure has been formalized somewhere but haven’t been able to find?
The influence of the British Empire on progressivism.
There was that book that talked about how North Korea got its methods from the Japanese occupation, and as soon as I saw that, I thought, “well, didn’t something similar happen here?” A while after that, I started reading Imagined Communities, got to the part where Anderson talks about Macaulay, looked him up, and went, “aha, I knew it!” But as far as I know, no one’s looked at it.
Also, I think I stole “culture is an engineering problem” from a Front Porch Republic article, but I haven’t been able to find the article, or anyone else writing rigorously about anything closer in ideaspace to that than dynamic geography, except the few people who approach something similar from an HBD or environmental determinism angle.
Tangent thread: What sophisticated idea are you holding on to that you are sure has been formalized somewhere but haven’t been able to find?
I’ll go first: When called to explain and defend my ethics I explained I believe in “Karma, NO not the that BS mysticism Karma, but plain old actions have consequences in our very connected world kind of Karma.” If you treat people in a manner of honesty and integrity in all things, you will create a community of cooperation. The world is strongly interconnected and strongly adaptable so the benefits will continue outside your normal community, or if you frequently change communities. The lynchpin assumption of these beliefs is that if I create One Unit of Happiness for others, it will self propagate, grow and reflect, returning me more that One Unit of Happiness over the course of my lifetime. The same applies for One Unit of Misery.
I’ve only briefly studied ethics and philosophy, can someone better read point my to the above in formal context.
This seems like a good place to ask about something that I’m intensely curious about but haven’t yet seen discussed formally. I’ve wanted to ask about it before, but I figured it’s probably an obvious and well-discussed subject that I just haven’t gotten to yet. (I only know the very basics of Bayesian thinking, I haven’t read more than about 1⁄5 of the sequences so far, and I don’t yet know calculus or advanced math of any type. So there are an awful lot of well-discussed LW-type subjects that I haven’t gotten to yet.)
I’ve long conceived of Bayesian belief statements in the following (somewhat fuzzily conceived) way: Imagine a graph where the x-axis represents our probability estimate for a given statement being true and the y-axis represents our certainty that our probability estimate is correct. So if, for example, we estimate a probability of .6 for a given statement to be true but we’re only mildly certain of that estimate, then our belief graph would probably look like a shallow bell curve centered on the .6 mark of the x-axis. If we were much more certain of our estimate then the bell curve would be much steeper.
I usually think of the height of the curve at any given point as representing how likely I think it is that I’ll discover evidence that will change my belief. So for a low bell curve centered on .6, I think of that as meaning that I’d currently assign the belief a probability of around .6 but I also consider it likely that I’ll discover evidence (if I look for it) that can change my opinion significantly in any direction.
I’ve found this way of thinking to be quite useful. Is this a well-known concept? What is it called and where can I find out more about it? Or is there something wrong with it?
I don’t understand where the bell curve is coming from. If you have one probability estimate for a given statement with some certainty about it, you would depict it as a single point on your graph.
The bell curves in this context usually represent probability distributions. The width of that probability distribution reflects your uncertainty. If you’re certain, the distribution is narrow and looks like a spike at the estimate value. If you’re uncertain, the distribution is flat(ter). Probability distributions have to sum to 1 under the curve, so the smaller the width of the distribution, the higher the spike is.
How likely you are to discover new evidence is neither here nor there. Even if you are very uncertain of your estimate, this does not convert into the probability of finding new evidence.
I think you’re referring to the type of statement that can have many values. Something like “how long will it take for AGI to be developed?”. My impression (correct me if I’m wrong) is that this is what’s normally graphed with a probability distribution. Each possible value is assigned a probability, and the result is usually more or less a bell curve with the width of the curve representing your certainty.
I’m referring to a very basic T/F statement. On a normal probability distribution graph that would indeed be represented as a single point—the probability you’d assign to it being true. But we’re often not so confident in our assessment of the probability we’ve assigned, and that confidence is what I was trying to represent with the y-axis.
An example might be, “will AGI be developed within 30 years”? There’s no range of values here, so on a normal probability distribution graph you’d simply assign a probability and that’s it. But there’s a very big difference between saying “I really have not the slightest clue, but if I really must assign it a probability than I’d give it maybe 50%” vs. “I’ve researched the subject for years and I’m confident in my assessment that there’s a 50% probability”.
In my scheme, what I’m really discussing is the probability distribution of probability estimates for a given statement. So for the 30-year AGI question, what’s the probability that you’d consider a 10% probability estimate to be reasonable? What about a 90% estimate? The probability that you’d assign to each probability estimate is depicted as a single point on the graph and the result is usually more or less a bell curve.
You’re probably correct about this. But I’ve found the concept of the kind of graph I’ve been describing to be intuitively useful, and saying that it represents the probability of finding new evidence was just my attempt at understanding what such a graph would actually mean.
OK, let’s rephrase it in the terms of Bayesian hierarchical models. You have a model of event X happening in the future which says that the probability of that event is Y%. Y is a parameter of your model. What you are doing is giving a probability distribution for a parameter of your model (in the general case this distribution can be conditional, which makes it a meta-model, so hierarchical). That’s fine, you can do this. In this context the width of the distribution reflects how precise your estimate of the lower-level model parameter is.
The only thing is that for unique events (“will AGI be developed within 30 years”) your hierarchical model is not falsifiable. You will get a single realization (the event will either happen or it will not), but you will never get information on the “true” value of your model parameter Y. You will get a single update of your prior to a posterior and that’s it.
Is that what you have in mind?
I think that is what I had in mind, but it sounds from the way you’re saying it that this hasn’t been discussed as a specific technique for visualizing belief probabilities.
That surprises me since I’ve found it to be very useful, at least for intuitively getting a handle on my confidence in my own beliefs. When dealing with the question of what probability to assign to belief X, I don’t just give it a single probability estimate, and I don’t even give it a probability estimate with the qualifier that my confidence in that probability is low/moderate/high. Rather I visualize a graph with (usually) a bell curve peaking at the probability estimate I’d assign and whose width represents my certainty in that estimate. To me that’s a lot more nuanced than just saying “50% with low confidence”. It has also helped me to communicate to others what my views are for a given belief. I’d also suspect that you can do a lot of interesting things by mathematically manipulating and combining such graphs.
One problem is that it’s turtles all the way down.
What’s your confidence in your confidence probability estimate? You can represent that as another probability distribution (or another model, or a set of models). Rinse and repeat.
Another problem is that it’s hard to get reasonable estimates for all the curves that you want to mathematically manipulate. Of course you can wave hands and say that a particular curve exactly represents your beliefs and no one can say it ain’t so, but fake precision isn’t exactly useful.
Taken literally, the concept of “confidence in a probability” is incoherent. You are probably confusing it with one of several related concepts. Lumifer has described one example of such a concept.
Another concept is how much you think your probability estimate will change as you encounter new evidence. For example, your estimate for whether the outcome of the coin flip for the 2050 Superbowl will be heads is 1⁄2, and you are unlikely to encounter evidence that changes it (until 2050 that is). On the other hand, your estimate for the probability AI being developed by 2050 is likely to change a lot as you encounter more evidence.
I don’t know, I think the existence of the 2050 Superbowl is significantly less than 100% likely.
What’s your line of thought?
It wouldn’t be the first time a sport has gone from vastly popular to mostly forgotten within 40 years. Jai alai was the particular example I had in mind; it was once incredibly popular, but quickly descended to the point where it’s basically entirely forgotten.
Why? I thought the way Lumifer expressed it in terms of Bayesian hierarchical models was pretty coherent. It might be turtles all the way down as he says, and it might be hard to use it in a rigorous mathematical way, but at least it’s coherent. (And useful, in my experience.)
This is pretty much what I meant in my original post by writing:
But expressing it in terms of how likely my beliefs are to change given more evidence is probably better. Or to say it in yet another way: how strong new evidence would need to be for me to change my estimate.
It seems like the scheme I’ve been proposing here is not a common one. So how do people usually express the obvious difference between a probability estimate of 50% for a coin flip (unlikely to change with more evidence) vs. a probability estimate of 50% for AI being developed by 2050 (very likely to change with more evidence)?
I believe you may be confusing the “map of the map” for the “map”.
If I understand correctly, you want to represent your beliefs about a simple yes/no statement. If that is correct, the appropriate distribution for your prior is Bernoulli. For a Bernoulli distribution, the X axis only has two possible values: True or False. The Bernoulli distribution will be your “map”. It is fully described by the parameter “p”
If you want to represent your uncertainty about your uncertainty, you can place a hyperprior on p. This is your “map of the map”. Generally, people will use a beta distribution for this (rather than a bell-shaped normal distribution). With such a hyperprior, p is on the X-axis and ranges from 0 to 1.
I am slightly confused about this part, but it is not clear to me that we gain much from having a “map of the map” in this situation, because no matter how uncertain you are about your beliefs, the hyperprior will imply a single expected value for p
I believe you may be confusing the “map of the map” for the “map”.
If I understand correctly, you want to represent your beliefs about a simple yes/no statement. If that is correct, the appropriate distribution for your prior is Bernoulli. For a Bernoulli distribution, the X axis only has two values: True or False. The Bernoulli distribution will be your “map”. It is fully described by the parameter “p”
If you want to represent your uncertainty about your uncertainty, you can place a hyperprior on p. This is your “map of the map”. Generally, people will use a beta distribution for this (rather than a bell-shaped normal distribution). With such a hyperprior, p is on the X-axis and ranges from 0 to 1.
I am slightly confused about this part, but it is not clear to me that we gain much from having a “map of the map” in this situation, because no matter how uncertain you are about your beliefs, the hyperprior will imply a single expected value for p.
The influence of the British Empire on progressivism.
There was that book that talked about how North Korea got its methods from the Japanese occupation, and as soon as I saw that, I thought, “well, didn’t something similar happen here?” A while after that, I started reading Imagined Communities, got to the part where Anderson talks about Macaulay, looked him up, and went, “aha, I knew it!” But as far as I know, no one’s looked at it.
Also, I think I stole “culture is an engineering problem” from a Front Porch Republic article, but I haven’t been able to find the article, or anyone else writing rigorously about anything closer in ideaspace to that than dynamic geography, except the few people who approach something similar from an HBD or environmental determinism angle.
I believe Rational Self Interest types make similar arguments, though I can’t recall anyone breaking it down to marginal gains in utility.