“Most people make the mistake of generalizing from a single data point. Or at least, I do.”—SA
When can you learn a lot from one data point? People, especially stats- or science- brained people, are often confused about this, and frequently give answers that (imo) are the opposite of useful. Eg they say that usually you can’t know much but if you know a lot about the meta-structure of your distribution (eg you’re interested in the mean of a distribution with low variance), sometimes a single data point can be a significant update.
This type of limited conclusion on the face of it looks epistemically humble, but in practice it’s the opposite of correct. Single data points aren’t particularly useful when you know a lot, but they’re very useful when you have very little knowledge to begin with. If your uncertainty about a variable in question spans many orders of magnitude, the first observation can often reduce more uncertainty than the next 2-10 observations put together[1]. Put another way, the most useful situations for updating massively from a single data point are when you know very little to begin with.
For example, if an alien sees a human car for the first time, the alien can make massive updates on many different things regarding Earthling society, technology, biology and culture. Similarly, an anthropologist landing on an island of a previously uncontacted tribe can rapidly learn so much about a new culture from a single hour of peaceful interaction [2].
Some other examples:
Your first day at a new job.
First time visiting a country/region you previously knew nothing about. One afternoon in Vietnam tells you roughly how much things cost, how traffic works, what the food is like, languages people speak, how people interact with strangers.
Trying a new fruit for the first time. One bite of durian tells you an enormous amount about whether you’ll like durian.
Your first interaction with someone’s kid tells you roughly how old they are, how verbal they are, what they’re like temperamentally. You went from “I know nothing about this child” to a working model.
Far from idiosyncratic and unscientific, these forms of “generalizing from a single data point” are just very normal, and very important, parts of normal human life and street epistemology.
This is the point that Douglas Hubbard tries to hammer in repeatedly over the course of his book (How to Measure Anything): You know less than you think you do, and a single measurement can be sometimes be a massive update.
[1] this is basically tautological from a high-entropy prior.
That’s a case of reducing a high uncertainty (high entropy). The more classical Bayesian case where you learn a lot is when you were previously very certain about what the first data point will look like (i.e. you “know” a lot in your terminology, though knowledge implies truth, so that’s arguably the wrong term), but then the first data point turns out to be very different from what you expected.
So in summary, you will learn very little from a single example if you are both a) very sure about what it will look like and b) it then actually very much looks like you expected.
“Most people make the mistake of generalizing from a single data point. Or at least, I do.”—SA
When can you learn a lot from one data point? People, especially stats- or science- brained people, are often confused about this, and frequently give answers that (imo) are the opposite of useful. Eg they say that usually you can’t know much but if you know a lot about the meta-structure of your distribution (eg you’re interested in the mean of a distribution with low variance), sometimes a single data point can be a significant update.
This type of limited conclusion on the face of it looks epistemically humble, but in practice it’s the opposite of correct. Single data points aren’t particularly useful when you know a lot, but they’re very useful when you have very little knowledge to begin with. If your uncertainty about a variable in question spans many orders of magnitude, the first observation can often reduce more uncertainty than the next 2-10 observations put together[1]. Put another way, the most useful situations for updating massively from a single data point are when you know very little to begin with.
For example, if an alien sees a human car for the first time, the alien can make massive updates on many different things regarding Earthling society, technology, biology and culture. Similarly, an anthropologist landing on an island of a previously uncontacted tribe can rapidly learn so much about a new culture from a single hour of peaceful interaction [2].
Some other examples:
Your first day at a new job.
First time visiting a country/region you previously knew nothing about. One afternoon in Vietnam tells you roughly how much things cost, how traffic works, what the food is like, languages people speak, how people interact with strangers.
Trying a new fruit for the first time. One bite of durian tells you an enormous amount about whether you’ll like durian.
Your first interaction with someone’s kid tells you roughly how old they are, how verbal they are, what they’re like temperamentally. You went from “I know nothing about this child” to a working model.
Far from idiosyncratic and unscientific, these forms of “generalizing from a single data point” are just very normal, and very important, parts of normal human life and street epistemology.
This is the point that Douglas Hubbard tries to hammer in repeatedly over the course of his book (How to Measure Anything): You know less than you think you do, and a single measurement can be sometimes be a massive update.
[1] this is basically tautological from a high-entropy prior.
[2] I like Monolingual Fieldwork as a demonstration for the possibilities in linguistics: https://www.youtube.com/watch?v=sYpWp7g7XWU&t=2s
See also: The First Sample Gives the Most Information
wow thanks! It’s the same point but he puts it better.
That’s a case of reducing a high uncertainty (high entropy). The more classical Bayesian case where you learn a lot is when you were previously very certain about what the first data point will look like (i.e. you “know” a lot in your terminology, though knowledge implies truth, so that’s arguably the wrong term), but then the first data point turns out to be very different from what you expected.
So in summary, you will learn very little from a single example if you are both a) very sure about what it will look like and b) it then actually very much looks like you expected.