That clarifies it and makes a lot of sense. Seems my objection rested upon a misunderstanding of your true intention. In short, no worries.
I look forwards to figuring this out together.
I’m currently working through my own thoughts and vision for tagging.
If and when we end up building a tagging system on LessWrong, the goal will be to distinguish the main types of post people are interested in viewing, and create a limited number of tags that determine this. I think building this will mainly help users who are viewing new content on the frontpage, and that for much more granular sorting of historical content, a wiki is better placed.
I’m pretty sure I disagree with this and object to you making an assertion that makes it sound like the team is definitely decided about what the goal of tagging system will be.
I’ll write a proper response tomorrow.
I really like the green-unread on post pages. On Recent Discussion I have so much of it that I think I don’t really pay attention to it.
I find this compelling (along with the “finding out which things matter that you didn’t realize mattered) and think this is a reason for us to begin doing A/B testing sometime in not too distant future.
I see the spirit of what you’re saying and think there’s something to it though it doesn’t feel completely correct. That said, I don’t think anyone on the team has experience with that kind of A/B testing loop and given that lack of experience, we should try it out for at least a while on some projects.
To date, I’ve been working just to get us to have more of an analytics-mindset plus basic thorough analytics throughout the app, e.g. tracking on each of the features/buttons we build, etc. (This wasn’t trivial to do with e.g. Google Tag Manager so we’ve ended up building stuff in-house.) I think trying out A/B testing would likely make sense soon, but as above, I think there’s a lot of value even before it with more dumb/naive analytics.
We trialled FullStory for a few weeks and I agree it’s good, but also we just weren’t using it enough to justify it. LogRocket offers monthly subscription though and likely we’ll sign up for that soon. (Once we’re actually using it fully, not just trialling, we’ll need to post about it properly, build opt-out, etc. and be good around privacy—already in trial we hid e.g. voting, usernames.)
To come back to the opening points in the OP, we probably shouldn’t get too bogged down trying to optimize specific simple metrics by getting all the buttons perfect, etc., given the uncertainty over which metrics are even correct to focus on. For example, there isn’t any clear metric (that I can think of) that definitely answers how much to focus on bringing in new users and getting them up to speed vs building tools for existing users already producing good intellectual progress. I think it’s correct that have to use high-level models and fuzzier techniques to think about big project prioritization. A/B tests won’t resolve the most crucial uncertainties we have though I do think they’re likely to hugely helpful in refining our design sense.
This is roughly the procedure we usually follow.
Trying to optimize a metric without even having a test framework in place adds a lot of evidence to that story—certainly in my own start-up experience, we never had any idea what we were doing until well after the test framework was in place (at any of the companies I’ve worked at). Analytics more generally were also always crucial for figuring out where the low-hanging fruit was and which projects to prioritize, and it sounds like you guys are currently still flying blind in that department.
I think I agree with the general spirit here. Throughout my year with the LessWrong team, I’ve been progressively building out analytics infrastructure to reduce my sense of the “flying blind” you speak of. We’re not done yet, but I’ve now got a lot of data at my fingertips. I think the disagreement here would be over whether anything short of A/B testing is valuable. I’m pretty sure that it is.
A number of these projects were already on our docket, but less visible is the projects which were delayed and the fact that those selected might not have been done now otherwise. For example, if we hadn’t been doing metric quarter, I’d like have spent more of my time continuing work on the Open Questions platform and much less of my time doing interviews and talking to authors. Admittedly, subscriptions and the new editor are projects we were already committed to and had been working on, but if we hadn’t thought they’d help with the metric, we’d have delayed it to the next quarter the way we did with many of other project ideas.
We did brainstorm however, but as Oli said, it wasn’t easy to come with any ideas which were obviously much better.
Heckling appreciated. I’ll add a bit more to Habryka’s response.
Separate from the question of whether A/B would have been applicable to our projects, I’m not sure why think it’s pointless to try to make inferences without them. True, A/B tests are cleaner and more definitive, and what we observed is plausibly what would have happened even with different activities, but that isn’t to say we don’t learn a lot when the outcome is one of a) metric/growth stays flat, b) small decrease, c) small increase, d) large decrease, e) large increase. In particular, the growth we saw (increase in absolute and rate) is suggestive of doing something real and also strong evidence against the hypothesis that it’d be very easy to drive a lot of growth.
Generally, it’s at least suggestive that the first quarter where we explicitly we focus on growth is one where we see 40% growth from last quarter (compared to 20% in the previous quarter to the one before). It could be a coincidence, but I feel like there are still likelihood ratios here.
When it comes to attribution too, with some of these projects it’s easy to get much more of an idea even without A/B testing. I can look at the posts from authors who we contacted and reasonably believe counterfactually would not have otherwise posted and see how much karma that generated. Same from Petrov Days and MSFP.
Because they’re interested in weapons and making people distinctly not safe.
A friend in the AI space who visited Washington told me that military leaders distinctly do not like the term “safety”.
Recently clarified guidelines:
But sleeping pills aren’t just about stress, they’re very concretely about putting you to sleep which they do.
No matter how many false physical constraints we overturn the second law of thermodynamics seems to guarantee (this is debatable) that we will eventually hit a wall . . .
Granted that we will eventually hit a wall, there’s a good chance the wall is so unbelievably far off that it might as not exist for another million or billion years and allow for astronomical (literally) amounts of growth. Heck, even what we get out of the Earth alone could be increased multiple orders of magnitude. Suppose there’s a point at thinking about slowing down, I think that point is very far away.
I’ll quote a bit from my summary of Eternity in Six Hours, which I find credible:
Travelling at 50c% there are 116 million galaxies reachable; at 80% there are 762 million galaxies reachable; at 99%c, you get 4.13 billion galaxies.For reference, there are 100 to 400 billion stars in the Milky Way, and from a quick check it might be reasonable to assume 100 billion is the average galaxy.The ability to colonize the universe as opposed to just the Milky Way is the difference between ~10^8 stars and ~10^16 or ~10^17 starts. A factor of 100 million.
Travelling at 50c% there are 116 million galaxies reachable; at 80% there are 762 million galaxies reachable; at 99%c, you get 4.13 billion galaxies.
For reference, there are 100 to 400 billion stars in the Milky Way, and from a quick check it might be reasonable to assume 100 billion is the average galaxy.
The ability to colonize the universe as opposed to just the Milky Way is the difference between ~10^8 stars and ~10^16 or ~10^17 starts. A factor of 100 million.
Similarly, the sun’s estimated energy output is 3.8x10^26W (Joules per Second) whereas civilization’s current energy usage is estimate at ~10^24J/year in a recent year (2012 or 2014?). That’s something like 9 orders of magnitude of more energy that’s being expended than we currently use (simplifying a whole bunch).
There’s finite and there’s finite, some of those finite’s are freaking huge. I say let’s get ’em.
But being more serious, if we think about the EV of different strategies, I think the EV continuing to pursue growth (as jasoncrawford defines it) for the foreseeable future is better than very prematurely trying to limit growth and be “sustainable” notwithstanding the risks that eventually there will be some kind of crunch.
Admittedly, I could be wrong about the limits of potential technological capabilities. If for some reason we hit a a limit of what we can do far earlier, then there might be a wall far sooner than when we run out of energy. But even such a wall seems at least quite a ways off.
Good post. I like these examples, though I find the names you’ve given to each type to not that be that evocative and I expect to struggle to remember them or explain them to others. For example, sleeping pills do something real. Perhaps you mean something like sugar (placebo) pills? Or antibiotics for a cold?
I really appreciate seeing this kind of applied statistical analysis to a stray interesting-sounding fact you heard.