Data Analysis of LW: Activity Levels + Age Distribution of User Accounts

Epistemic: I rarely trust other people’s data analysis, I only half trust my own. Right now, analytics is only getting a slice of my attention and this work is not as thorough as I’d like, but I think the broad strokes picture is correct. I have probably failed to include enough clarifications and disclaimers on where we should expect the data to be inaccurate. Feedback on my approach welcome.

I’ve been doing some analytics work for the LessWrong 2.0 team since September last year (since March I’ve been doing other work too, but that’s not relevant here). This post will hopefully be the first a series which will eliminate the backlog of analytics results I’ve been wanting to share.

This post is probably not the ideal starting point—that would be probably be a big picture general overview of LessWrong usage since the beginning—but it is some of my most recent work and therefore is easiest to share. Still, it does show things about the bigger picture.

Warning: The graphs are repetitive even though they’re showing different things. I’ve included them all for completeness, but you can just read my summary/​interpretations while looking at only some of them.

Distribution of User Account Age

Question: LW2 seems to be doing well, but is that just because we’re retaining/​re-engaging a devoted base of older users despite not signing up new users?

Answer: Activity on LW2 is coming from both new and old users across all activity types (posts, comments, votes, and views). The project is succeeding at getting new people to create accounts and engage.

In fact, there have consistently been more new users voting and viewing each month since LW2 launched than throughout LW’s past. The number of new users posting each month is roughly the same as historical levels. The number of new users who are commenting has declined (though the percentage new users is roughly the same), however this is consistent with the trend that comment volume on LW2 has not recovered from The Great Decline of 2015-2017 the way other metrics have.

Meaning of the Graphs

I plotted graphs for each activity type (posts, comments, votes, and views) and the corresponding population of users which engages in those activities. For each user engaging in each activity type, I calculated the “age” of that account since it first engaged in that activity type, i.e. in the graph for users who post, the age of the user account in a given month is the number of days elapsed since that account first posted. In the graph for commenters, is the days elapsed since the account first commented. This avoids certain complications which inconsistencies in how the data was recorded for different activities over LessWrong’s histories.

I segmented the user accounts into four “buckets” based on their “age” [since first engaging in activity of type X].

  • 0 − 90 days

  • 90 − 360 days (~3 months to 12 months old)

  • 360 − 720 days (~1 to 2 years)

  • 720 − 10,000 days (~2+ years )

When I’ve said new user accounts, I have been meaning 0 − 90 days; when I’ve said old users, I’ve meant the 720+ days bucket.

Caveat: we believe that many old users created new accounts when LW2 launched and this is somewhat confounding the data, though not necessarily a lot.

Reading the Graphs

  • X-Axis is time

  • Values plotted are the total values for each month

  • Y-Axis is about the number of individuals engaging in a behavior in a given month, e.g there ~600 people who viewed posts, 30% of which have had less 90 days elapsed since they were first recorded viewing a post as a logged-in user***.

  • In each set of graphs by activity:

    • The first set (2x2) shows a time series line for each age bucket segment alone.

    • The second long graph shows an area plot time series with age buckets segments stacked. This lets you see overall size of the population engaging an activity type over time.

    • The third graph is a 100% area plot which shows the composition of the overall population by “age” of the user accounts over time.

  • A moving-average filter of three time months has been applied for smoothing.

***All data here is from logged-in users!!! Including views. View counts of non-logged in users are over an order of magnitude higher.

Poster Distribution of Age

Questions and posts with 2 or less upvotes have been filtered out. Event/​meetup posts have not.

In addition to our primary focus on the age distribution of accounts, we can note the inflection point occurring in September/​October 2017. This corresponds to the launch of the LessWrong 2.0 Open Beta 9-20 and publishing of Eliezer’s Inadequate Equilibria on LW* on 10-28. I have marked 2017-10-01 on the graphs with a dotted black line.

*The LW2 team requested Inadequate Equilibria be published on LW2 as an initial draw.

We see from these graphs that the number of users making posts each month is almost as high (~75%) as historical levels, especially those after 2013.

Unsurprisingly, over time more profiles fall into in the “2+ year since their first post” bucket since the longer LW has existed, the more profiles which are at 2+ years can exist. Percentage of users posting with accounts less than 90 days since first post (this includes their first post) has remained almost constant over time with the exception of during the decline period 2015-2017.

A small aside: it’s interesting to note that the nature of posts has shifted somewhat. The median post length on LW2 (~1000 words) is double that from old LW (~500 words). Main posts were on average much longer than Discussion posts (median ~1000 words vs ~300 words). The distribution of post length on LW2 almost exactly matches that LW’s Main section despite having far more posts. The net result is that at least many total words of posts are being written on LW2 compared to legacy LW.

Inserting some analysis from a few months ago. I haven’t re-checked this before including though, so slightly higher chance that I messed something up.

Post Length Distributions for LW1 Discussion, LW1 Main, and LW2

Word count is naively calculated as character count divided by 6, hence the fractional values.

I vaguely suspect that the shift in post length signifies a change in how LessWrong is used and that this is related to the large reduction in comment volume (see next section). A hypothesis is that old LW used to be used for some of the same uses as Facebook and other social media currently fulfils for people, and that new LW2 is now primarily serving some other need.

Commenter Distribution of Age

The graphs for commenters reveal a significant reality for LW2: while post, vote, and view have resurged since The Great Decline of 2015-2017, commenting levels have not returned to anything near historical levels. Since the LW2.0 launch, the percentage of commenters who are new commenters are at its highest levels since 2013 while commenters who began commenting 2+ years ago has been steady at 50% of commenters. The topmost left graph (blue line) shows that there were no new users commenting in the period before LW2 but that this changed with the launch of LW2 and Inadequate Equlibria.

Voter Distribution of Age

The graphs for population of voters tell an interesting tale. There has been a dramatic increase in the number of new users voting while the number and proportion of accounts who first voted 2+ years ago has stayed almost steady/​declined a little. The net result is that voters who first voted within the last two years are making up 60% of the voting population. (The effects on overall karma distributed can’t be straightforwardly inferred from this alone since it will depend on how many votes each user makes and their karma scores.)

Logged-In Viewer Distribution of Age

The distribution of user account age for logged-in viewers is similar to that for votes. Large uptick for new accounts, yet no growth among older user accounts. The data here however is “compromised” since in March 2018 all users were logged-out. Users who failed to login again (which is unnecessary if you are not posting, commenting, or voting), would no longer be detected. The drop in logged-in viewer population can be seen in early 2018, particularly in the 2+ year plus time series (red line). After that point, it is mostly flat similar to the case for voters.

Concluding Thoughts

It’s heartening to see that LW2 has made a difference to the trajectory of LW. A site which was nearly put into read-only archive mode is definitely alive and kicking. Counter to my fears, LW2 is drawing in new users rather than purely being sustained by a committed core of older users from LW1. This is despite not yet focusing on recruiting new users, e.g. via promotion of content on social media.

However, the rate of new users which come on is matched by the number of users failing to return (be they older users or new users who aren’t sticking around). Overall, most of the straightforward analytics metrics for LW have not grown significantly since its launch. I suspect that if we understand what is going on with retention, we might be able to hold onto more of the new users and actually cause upwards growth. . . . assuming we want that. I and others on the team don’t blindly believe that growth for the sake of growth is good. We’ll continue to think carefully about any actions we might take that even if they caused “growth”, might cause LW2 not to be the place we want it to be.

I’ve only had a very cursory look at retention. I found that new users were returning in the first month after signing up at historical levels (~30%), but that retention three months after joining is less than half of historical levels (~20%->%5). This is odd. However, these numbers are only from a cursory glance and I haven’t been very thorough yet either in coding mistakes or even thinking about it right.. This paragraph is low confidence.

Another point is that users of LW2 don’t use the site, on average, as much as they did during LW’s peak. Up to 2014, the median user was present on LW for 4-5 days each month (i.e. 431 days); in the last couple of years that has been 2-3 days. This might correspond to fewer people commenting since being engaged in ongoing comment threads might keep people coming back. The team is curious how a revamped email notification system (currently under development) will affect frequency of visiting LW.

Lastly, and I think it’s okay for me to say this, is that many of the most significant contributors to LW in the past are still present on the site—lurking—even if they post and comment far less. (I will soon write a post on how we use user data for analytics and decision-making; rest assured we have an extremely hard policy against ever sharing individual user data which is not public.) I think it’s a good sign that LW2 is generating enough content and discussion that these users still want to keep up date with LW2. It makes me hopeful that LW2 might even become a (the?) central place of discussion.

For those interested in working with LessWrong data

To protect user privacy, we’re not able to grant full-access to our database to the public, however there might be more limited datasets which we can release. If there’s enough interest, I’ll discuss this with the team.