Upvoted—thanks for a long, even if not fully even handed, reply (also it is perhaps not most transparent to explain CIs using a discrete set of hypotheses). I will try to give an example with a continuous valued parameter.
Say we want to estimate the mean height of LW posters. Ignoring the issue of sock puppets for the moment, we could pick LW usernames out of a hat, show up at the person with that username’s house, and measure their height. Say we do that for 100 LW users we picked randomly, and take an average, call it X1. The 100 users are a “sample” and X1 is a “sample mean.” If we randomly picked a different set of 100, we would get a different average, call it X2. If again a different set of 100, we would get yet a different average, call it X3, etc.
These X1, X2, X3 are realizations of something called the “sampling distribution,” call it Ps. This distribution is a different thing than the distribution that governs height among all LW users, call it Ph. Ph could be anything in general, maybe Gaussian, maybe bimodal, maybe something weird. But if we can figure out what the distribution Ps is, we could make statements of the form
“most of the times where I pick a sample Xi from Ps, e.g. most of the time I pick 100 LW users at random and get their average heights, this average will be pretty close to the real average height of all LW users, under a very small set of assumptions on Ph.”
This is what confidence intervals are about. In fact, if the number of LW users we pick for our sample is large enough, we can well-approximate Ps by a Gaussian distribution because of a neat result called the Central Limit Theorem, (again regardless of what Ph is, or more precisely under very mild assumptions on Ph).
What makes these kinds of statements powerful is that we can sometimes make them without needing to know much at all about Ph. Sometimes it is useful to be able to say something like that—maybe we are very uncertain about Ph, or we suspect shenanigans with how Ph is defined.
Upvoted—thanks for a long, even if not fully even handed, reply (also it is perhaps not most transparent to explain CIs using a discrete set of hypotheses). I will try to give an example with a continuous valued parameter.
Say we want to estimate the mean height of LW posters. Ignoring the issue of sock puppets for the moment, we could pick LW usernames out of a hat, show up at the person with that username’s house, and measure their height. Say we do that for 100 LW users we picked randomly, and take an average, call it X1. The 100 users are a “sample” and X1 is a “sample mean.” If we randomly picked a different set of 100, we would get a different average, call it X2. If again a different set of 100, we would get yet a different average, call it X3, etc.
These X1, X2, X3 are realizations of something called the “sampling distribution,” call it Ps. This distribution is a different thing than the distribution that governs height among all LW users, call it Ph. Ph could be anything in general, maybe Gaussian, maybe bimodal, maybe something weird. But if we can figure out what the distribution Ps is, we could make statements of the form
“most of the times where I pick a sample Xi from Ps, e.g. most of the time I pick 100 LW users at random and get their average heights, this average will be pretty close to the real average height of all LW users, under a very small set of assumptions on Ph.”
This is what confidence intervals are about. In fact, if the number of LW users we pick for our sample is large enough, we can well-approximate Ps by a Gaussian distribution because of a neat result called the Central Limit Theorem, (again regardless of what Ph is, or more precisely under very mild assumptions on Ph).
What makes these kinds of statements powerful is that we can sometimes make them without needing to know much at all about Ph. Sometimes it is useful to be able to say something like that—maybe we are very uncertain about Ph, or we suspect shenanigans with how Ph is defined.