Raemon comments on Searching for outliers

Raemon 28 Mar 2022 18:35 UTC
13 points
0
This post gave me a crisp update and inspired some conversations about how many samples you need.
Around the office, the concept of how the first sample gives the most information has come up a bunch, and has worked its way into my thinking in a fairly deep way. It changed how I think about the LessWrong review, and which hypotheses are worth expensively testing. There’s a sense in which “an exploratory blogpost” is “the first sample” from a distribution of “what it’s like to think about a topic.” If I’m trying to find high-value concepts, it’s maybe more useful to only bother grabbing “the first sample” from a given concept. (some thoughts on this here, although not spelled out in that language)
But I feel like this post gave me another angle on this, that I’m still chewing on, where some distributions are heavy-tailed, and then you to very crudely sample from lots of them to find the high-yield samples. Sometimes the first sample isn’t the most informative because the most important things are heavy tailed.
When it comes to writing blogposts, I guess the two previous paragraphs are saying the same thing, framed differently. I have an intuition that there’s an overarching frame that’d be helpful to me. Still chewing on it.
I wanted to jot down this comment for now to say “thanks for contributing” – I at first thought this post was just gonna be a “a nicely written up summary of stuff I already knew about heavy tails”, but I feel like I got something deep out of it.