Data Analysis of LW: Activity Levels + Age Distribution of User Accounts

Epistemic: I rarely trust other peo­ple’s data anal­y­sis, I only half trust my own. Right now, an­a­lyt­ics is only get­ting a slice of my at­ten­tion and this work is not as thor­ough as I’d like, but I think the broad strokes pic­ture is cor­rect. I have prob­a­bly failed to in­clude enough clar­ifi­ca­tions and dis­claimers on where we should ex­pect the data to be in­ac­cu­rate. Feed­back on my ap­proach wel­come.

I’ve been do­ing some an­a­lyt­ics work for the LessWrong 2.0 team since Septem­ber last year (since March I’ve been do­ing other work too, but that’s not rele­vant here). This post will hope­fully be the first a se­ries which will elimi­nate the back­log of an­a­lyt­ics re­sults I’ve been want­ing to share.

This post is prob­a­bly not the ideal start­ing point—that would be prob­a­bly be a big pic­ture gen­eral overview of LessWrong us­age since the be­gin­ning—but it is some of my most re­cent work and there­fore is eas­iest to share. Still, it does show things about the big­ger pic­ture.

Warn­ing: The graphs are repet­i­tive even though they’re show­ing differ­ent things. I’ve in­cluded them all for com­plete­ness, but you can just read my sum­mary/​in­ter­pre­ta­tions while look­ing at only some of them.

Distri­bu­tion of User Ac­count Age

Ques­tion: LW2 seems to be do­ing well, but is that just be­cause we’re re­tain­ing/​re-en­gag­ing a de­voted base of older users de­spite not sign­ing up new users?

An­swer: Ac­tivity on LW2 is com­ing from both new and old users across all ac­tivity types (posts, com­ments, votes, and views). The pro­ject is suc­ceed­ing at get­ting new peo­ple to cre­ate ac­counts and en­gage.

In fact, there have con­sis­tently been more new users vot­ing and view­ing each month since LW2 launched than through­out LW’s past. The num­ber of new users post­ing each month is roughly the same as his­tor­i­cal lev­els. The num­ber of new users who are com­ment­ing has de­clined (though the per­centage new users is roughly the same), how­ever this is con­sis­tent with the trend that com­ment vol­ume on LW2 has not re­cov­ered from The Great De­cline of 2015-2017 the way other met­rics have.

Mean­ing of the Graphs

I plot­ted graphs for each ac­tivity type (posts, com­ments, votes, and views) and the cor­re­spond­ing pop­u­la­tion of users which en­gages in those ac­tivi­ties. For each user en­gag­ing in each ac­tivity type, I calcu­lated the “age” of that ac­count since it first en­gaged in that ac­tivity type, i.e. in the graph for users who post, the age of the user ac­count in a given month is the num­ber of days elapsed since that ac­count first posted. In the graph for com­menters, is the days elapsed since the ac­count first com­mented. This avoids cer­tain com­pli­ca­tions which in­con­sis­ten­cies in how the data was recorded for differ­ent ac­tivi­ties over LessWrong’s his­to­ries.

I seg­mented the user ac­counts into four “buck­ets” based on their “age” [since first en­gag­ing in ac­tivity of type X].

  • 0 − 90 days

  • 90 − 360 days (~3 months to 12 months old)

  • 360 − 720 days (~1 to 2 years)

  • 720 − 10,000 days (~2+ years )

When I’ve said new user ac­counts, I have been mean­ing 0 − 90 days; when I’ve said old users, I’ve meant the 720+ days bucket.

Caveat: we be­lieve that many old users cre­ated new ac­counts when LW2 launched and this is some­what con­found­ing the data, though not nec­es­sar­ily a lot.

Read­ing the Graphs

  • X-Axis is time

  • Values plot­ted are the to­tal val­ues for each month

  • Y-Axis is about the num­ber of in­di­vi­d­u­als en­gag­ing in a be­hav­ior in a given month, e.g there ~600 peo­ple who viewed posts, 30% of which have had less 90 days elapsed since they were first recorded view­ing a post as a logged-in user***.

  • In each set of graphs by ac­tivity:

    • The first set (2x2) shows a time se­ries line for each age bucket seg­ment alone.

    • The sec­ond long graph shows an area plot time se­ries with age buck­ets seg­ments stacked. This lets you see over­all size of the pop­u­la­tion en­gag­ing an ac­tivity type over time.

    • The third graph is a 100% area plot which shows the com­po­si­tion of the over­all pop­u­la­tion by “age” of the user ac­counts over time.

  • A mov­ing-av­er­age filter of three time months has been ap­plied for smooth­ing.

***All data here is from logged-in users!!! In­clud­ing views. View counts of non-logged in users are over an or­der of mag­ni­tude higher.

Poster Distri­bu­tion of Age

Ques­tions and posts with 2 or less up­votes have been filtered out. Event/​meetup posts have not.

In ad­di­tion to our pri­mary fo­cus on the age dis­tri­bu­tion of ac­counts, we can note the in­flec­tion point oc­cur­ring in Septem­ber/​Oc­to­ber 2017. This cor­re­sponds to the launch of the LessWrong 2.0 Open Beta 9-20 and pub­lish­ing of Eliezer’s Inad­e­quate Equil­ibria on LW* on 10-28. I have marked 2017-10-01 on the graphs with a dot­ted black line.

*The LW2 team re­quested Inad­e­quate Equil­ibria be pub­lished on LW2 as an ini­tial draw.

We see from these graphs that the num­ber of users mak­ing posts each month is al­most as high (~75%) as his­tor­i­cal lev­els, es­pe­cially those af­ter 2013.

Un­sur­pris­ingly, over time more pro­files fall into in the “2+ year since their first post” bucket since the longer LW has ex­isted, the more pro­files which are at 2+ years can ex­ist. Per­centage of users post­ing with ac­counts less than 90 days since first post (this in­cludes their first post) has re­mained al­most con­stant over time with the ex­cep­tion of dur­ing the de­cline pe­riod 2015-2017.

A small aside: it’s in­ter­est­ing to note that the na­ture of posts has shifted some­what. The me­dian post length on LW2 (~1000 words) is dou­ble that from old LW (~500 words). Main posts were on av­er­age much longer than Dis­cus­sion posts (me­dian ~1000 words vs ~300 words). The dis­tri­bu­tion of post length on LW2 al­most ex­actly matches that LW’s Main sec­tion de­spite hav­ing far more posts. The net re­sult is that at least many to­tal words of posts are be­ing writ­ten on LW2 com­pared to legacy LW.

Insert­ing some anal­y­sis from a few months ago. I haven’t re-checked this be­fore in­clud­ing though, so slightly higher chance that I messed some­thing up.

Post Length Distri­bu­tions for LW1 Dis­cus­sion, LW1 Main, and LW2

Word count is naively calcu­lated as char­ac­ter count di­vided by 6, hence the frac­tional val­ues.

I vaguely sus­pect that the shift in post length sig­nifies a change in how LessWrong is used and that this is re­lated to the large re­duc­tion in com­ment vol­ume (see next sec­tion). A hy­poth­e­sis is that old LW used to be used for some of the same uses as Face­book and other so­cial me­dia cur­rently fulfils for peo­ple, and that new LW2 is now pri­mar­ily serv­ing some other need.

Com­menter Distri­bu­tion of Age

The graphs for com­menters re­veal a sig­nifi­cant re­al­ity for LW2: while post, vote, and view have re­surged since The Great De­cline of 2015-2017, com­ment­ing lev­els have not re­turned to any­thing near his­tor­i­cal lev­els. Since the LW2.0 launch, the per­centage of com­menters who are new com­menters are at its high­est lev­els since 2013 while com­menters who be­gan com­ment­ing 2+ years ago has been steady at 50% of com­menters. The top­most left graph (blue line) shows that there were no new users com­ment­ing in the pe­riod be­fore LW2 but that this changed with the launch of LW2 and Inad­e­quate Equlibria.

Voter Distri­bu­tion of Age

The graphs for pop­u­la­tion of vot­ers tell an in­ter­est­ing tale. There has been a dra­matic in­crease in the num­ber of new users vot­ing while the num­ber and pro­por­tion of ac­counts who first voted 2+ years ago has stayed al­most steady/​de­clined a lit­tle. The net re­sult is that vot­ers who first voted within the last two years are mak­ing up 60% of the vot­ing pop­u­la­tion. (The effects on over­all karma dis­tributed can’t be straight­for­wardly in­ferred from this alone since it will de­pend on how many votes each user makes and their karma scores.)

Logged-In Viewer Distri­bu­tion of Age

The dis­tri­bu­tion of user ac­count age for logged-in view­ers is similar to that for votes. Large uptick for new ac­counts, yet no growth among older user ac­counts. The data here how­ever is “com­pro­mised” since in March 2018 all users were logged-out. Users who failed to lo­gin again (which is un­nec­es­sary if you are not post­ing, com­ment­ing, or vot­ing), would no longer be de­tected. The drop in logged-in viewer pop­u­la­tion can be seen in early 2018, par­tic­u­larly in the 2+ year plus time se­ries (red line). After that point, it is mostly flat similar to the case for vot­ers.

Con­clud­ing Thoughts

It’s heart­en­ing to see that LW2 has made a differ­ence to the tra­jec­tory of LW. A site which was nearly put into read-only archive mode is definitely al­ive and kick­ing. Counter to my fears, LW2 is draw­ing in new users rather than purely be­ing sus­tained by a com­mit­ted core of older users from LW1. This is de­spite not yet fo­cus­ing on re­cruit­ing new users, e.g. via pro­mo­tion of con­tent on so­cial me­dia.

How­ever, the rate of new users which come on is matched by the num­ber of users failing to re­turn (be they older users or new users who aren’t stick­ing around). Over­all, most of the straight­for­ward an­a­lyt­ics met­rics for LW have not grown sig­nifi­cantly since its launch. I sus­pect that if we un­der­stand what is go­ing on with re­ten­tion, we might be able to hold onto more of the new users and ac­tu­ally cause up­wards growth. . . . as­sum­ing we want that. I and oth­ers on the team don’t blindly be­lieve that growth for the sake of growth is good. We’ll con­tinue to think care­fully about any ac­tions we might take that even if they caused “growth”, might cause LW2 not to be the place we want it to be.

I’ve only had a very cur­sory look at re­ten­tion. I found that new users were re­turn­ing in the first month af­ter sign­ing up at his­tor­i­cal lev­els (~30%), but that re­ten­tion three months af­ter join­ing is less than half of his­tor­i­cal lev­els (~20%->%5). This is odd. How­ever, these num­bers are only from a cur­sory glance and I haven’t been very thor­ough yet ei­ther in cod­ing mis­takes or even think­ing about it right.. This para­graph is low con­fi­dence.

Another point is that users of LW2 don’t use the site, on av­er­age, as much as they did dur­ing LW’s peak. Up to 2014, the me­dian user was pre­sent on LW for 4-5 days each month (i.e. 431 days); in the last cou­ple of years that has been 2-3 days. This might cor­re­spond to fewer peo­ple com­ment­ing since be­ing en­gaged in on­go­ing com­ment threads might keep peo­ple com­ing back. The team is cu­ri­ous how a re­vamped email no­tifi­ca­tion sys­tem (cur­rently un­der de­vel­op­ment) will af­fect fre­quency of vis­it­ing LW.

Lastly, and I think it’s okay for me to say this, is that many of the most sig­nifi­cant con­trib­u­tors to LW in the past are still pre­sent on the site—lurk­ing—even if they post and com­ment far less. (I will soon write a post on how we use user data for an­a­lyt­ics and de­ci­sion-mak­ing; rest as­sured we have an ex­tremely hard policy against ever shar­ing in­di­vi­d­ual user data which is not pub­lic.) I think it’s a good sign that LW2 is gen­er­at­ing enough con­tent and dis­cus­sion that these users still want to keep up date with LW2. It makes me hope­ful that LW2 might even be­come a (the?) cen­tral place of dis­cus­sion.

For those in­ter­ested in work­ing with LessWrong data

To pro­tect user pri­vacy, we’re not able to grant full-ac­cess to our database to the pub­lic, how­ever there might be more limited datasets which we can re­lease. If there’s enough in­ter­est, I’ll dis­cuss this with the team.