Wikipedia usage survey results

Contents

Summary

The sum­mary is not in­tended to be com­pre­hen­sive. It high­lights the most im­por­tant take­aways you should get from this post.

  • Vipul Naik and I are in­ter­ested in un­der­stand­ing how peo­ple use Wikipe­dia. One rea­son is that we are get­ting more peo­ple to work on edit­ing and adding con­tent to Wikipe­dia. We want to un­der­stand the im­pact of these ed­its, so that we can di­rect efforts more strate­gi­cally. We are also cu­ri­ous!

  • From May to July 2016, we con­ducted two sur­veys of peo­ple’s Wikipe­dia us­age. We col­lected sur­vey re­sponses from au­di­ence seg­ments in­clude Slate Star Codex read­ers, Vipul’s Face­book friends, and a few au­di­ences through Sur­veyMon­key Au­di­ence and Google Con­sumer Sur­veys. Our sur­vey ques­tions mea­sured how heav­ily peo­ple use Wikipe­dia, what sort of pages they read or ex­pected to find, the re­la­tion be­tween their search habits and Wikipe­dia, and other ac­tions they took within Wikipe­dia.

  • Differ­ent au­di­ence seg­ments re­sponded very differ­ently to the sur­vey. Notably, the Sur­veyMon­key au­di­ence (which is closer to be­ing rep­re­sen­ta­tive of the gen­eral pop­u­la­tion) ap­pears to use Wikipe­dia a lot less than Vipul’s Face­book friends and Slate Star Codex read­ers. Their con­sump­tion of Wikipe­dia is also more pas­sive: they are less likely to ex­plic­itly seek Wikipe­dia pages when search­ing for a topic, and less likely to en­gage in ad­di­tional ac­tions on Wikipe­dia pages. Even the col­lege-ed­u­cated Sur­veyMon­key au­di­ence used Wikipe­dia very lit­tle.

  • This is ten­ta­tive ev­i­dence that Wikipe­dia con­sump­tion is skewed to­wards a cer­tain pro­file of peo­ple (and Vipul’s Face­book friends and Slate Star Codex read­ers sam­ple much more heav­ily from that pro­file). Even more ten­ta­tively, these heavy users tend to be more “elite” and in­fluen­tial. This ten­ta­tively led us to re­vise up­ward our es­ti­mates of the so­cial value of a Wikipe­dia pageview.

  • This was my first ex­er­cise in sur­vey con­struc­tion. I learned a num­ber of les­sons about sur­vey de­sign in the pro­cess.

  • All the sur­vey ques­tions, as well as the break­down of re­sponses for each of the au­di­ence seg­ments, are de­scribed in this post. Links to PDF ex­ports of re­sponse sum­maries are at the end of the post.

Background

At the end of May 2016, Vipul Naik and I cre­ated a Wikipe­dia us­age sur­vey to gauge the us­age habits of Wikipe­dia read­ers and ed­i­tors. Sur­veyMon­key al­lows the use of differ­ent “col­lec­tors” (i.e. sur­vey URLs that keep re­sults sep­a­rate), so we cir­cu­lated sev­eral differ­ent URLs among four lo­ca­tions to see how differ­ent au­di­ences would re­spond. The au­di­ences were as fol­lows:

  • Sur­veyMon­key’s United States au­di­ence with no de­mo­graphic filters (62 re­sponses, 54 of which are full re­sponses)

  • Vipul Naik’s timeline (post ask­ing peo­ple to take the sur­vey; 70 re­sponses, 69 of which are full re­sponses). For back­ground on Vipul’s timeline au­di­ence, see his page on how he uses Face­book.

  • The Wikipe­dia An­a­lyt­ics mailing list (email link­ing to the sur­vey; 7 re­sponses, 6 of which are full re­sponses). Note that due to the small size of this group, the re­sults be­low should not be trusted, un­less pos­si­bly when the votes are de­ci­sive.

  • Slate Star Codex (post that links to the sur­vey; 618 re­sponses, 596 of which are full re­sponses). While Slate Star Codex isn’t the same as LessWrong, we think there is sig­nifi­cant over­lap in the two sites’ au­di­ences (see e.g. the re­cent LessWrong di­as­pora sur­vey re­sults).

  • In ad­di­tion, al­though not an ac­tual au­di­ence with a sep­a­rate URL, sev­eral of the ta­bles we pre­sent be­low will in­clude an “H” group; this is the heavy users group of peo­ple who re­sponded by say­ing they read 26 or more ar­ti­cles per week on Wikipe­dia. This group has 179 peo­ple: 164 from Slate Star Codex, 11 from Vipul’s timeline, and 4 from the An­a­lyt­ics mailing list.

We ran the sur­vey from May 30 to July 9, 2016 (al­though only the Slate Star Codex sur­vey had a re­sponse past June 1).

After we looked at the sur­vey re­sponses on the first day, Vipul and I de­cided to cre­ate a sec­ond sur­vey to fo­cus on the parts from the first sur­vey that in­ter­ested us the most. The sec­ond sur­vey was only cir­cu­lated among Sur­veyMon­key’s au­di­ences: we used Sur­veyMon­key’s US au­di­ence with no de­mo­graphic filters (54 re­sponses), as well as a US au­di­ence of ages 18–29 with a col­lege or grad­u­ate de­gree (50 re­sponses). We first ran the sur­vey on the un­filtered au­di­ence again be­cause the word­ing of our first ques­tion was changed and we wanted to have the new baseline. We then chose to filter for young col­lege-ed­u­cated peo­ple be­cause our pre­dic­tion was that more ed­u­cated peo­ple would be more likely to read Wikipe­dia (the Sur­veyMon­key de­mo­graphic data does not in­clude ed­u­ca­tion, and we hadn’t seen the Pew In­ter­net Re­search sur­veys in the next sec­tion, so we were rely­ing on our in­tu­ition and some de­mo­graphic data from past sur­veys) and be­cause young peo­ple in our first sur­vey gave more in­for­ma­tive free-form re­sponses in sur­vey 2 (Sur­veyMon­key’s de­mo­graphic data does in­clude age).

We ran a third sur­vey on Google Con­sumer Sur­veys with a sin­gle ques­tion that was a word-to-word replica of the first ques­tion from the sec­ond sur­vey. The main mo­ti­va­tion here was that on Google Con­sumer Sur­veys, a sin­gle-ques­tion sur­vey costs only 10 cents per re­sponse, so it was pos­si­ble to get to a large num­ber of re­sponses at rel­a­tively low cost, and achieve more con­fi­dence in the ten­ta­tive con­clu­sions we had drawn from the Sur­veyMon­key sur­veys.

Pre­vi­ous surveys

Sev­eral de­mo­graphic sur­veys re­gard­ing Wikipe­dia have been con­ducted, tar­get­ing both ed­i­tors and users. The sur­veys we found most helpful were the fol­low­ing:

  • The 2010 Wikipe­dia sur­vey by the Col­lab­o­ra­tive Creativity Group and the Wiki­me­dia Foun­da­tion. The ex­pla­na­tion be­fore the bot­tom table on page 7 of the overview PDF has “Con­trib­u­tors show slightly but sig­nifi­cantly higher ed­u­ca­tion lev­els than read­ers”, which pro­vides weak ev­i­dence that more ed­u­cated peo­ple are more likely to en­gage with Wikipe­dia.

  • The Global South User Sur­vey 2014 by the Wiki­me­dia Foundation

  • Pew In­ter­net Re­search’s 2011 sur­vey: “Ed­u­ca­tion level con­tinues to be the strongest pre­dic­tor of Wikipe­dia use. The col­lab­o­ra­tive en­cy­clo­pe­dia is most pop­u­lar among in­ter­net users with at least a col­lege de­gree, 69% of whom use the site.” (page 3)

  • Pew In­ter­net Re­search’s 2007 survey

Note that we found the Pew In­ter­net Re­search sur­veys af­ter con­duct­ing our own two sur­veys (and dur­ing the write-up of this doc­u­ment).

Motivation

Vipul and I ul­ti­mately want to get a bet­ter sense of the value of a Wikipe­dia pageview (one way to mea­sure the im­pact of con­tent cre­ation), and one way to do this is to un­der­stand how peo­ple are us­ing Wikipe­dia. As we fo­cus on get­ting more peo­ple to work on edit­ing Wikipe­dia – thus caus­ing more peo­ple to read the con­tent we pay and help to cre­ate – it be­comes more im­por­tant to un­der­stand what peo­ple are do­ing on the site.

For some pre­vi­ous dis­cus­sion, see also Vipul’s an­swers to the fol­low­ing Quora ques­tions:

Wikipe­dia al­lows rel­a­tively easy ac­cess to pageview data (es­pe­cially by us­ing tools de­vel­oped for this pur­pose, in­clud­ing one that Vipul made), and there are some sur­veys that provide de­mo­graphic data (see “Pre­vi­ous sur­veys” above). How­ever, af­ter look­ing around, it was ap­par­ent that the kind of in­for­ma­tion our sur­vey was de­signed to find was not available.

I should also note that we were driven by our cu­ri­os­ity of how peo­ple use Wikipe­dia.

Sur­vey ques­tions for the first survey

For refer­ence, here are the sur­vey ques­tions for the first sur­vey. A dummy/​mock-up ver­sion of the sur­vey can be found here: https://​​www.sur­vey­mon­key.com/​​r/​​PDTTBM8.

The sur­vey in­tro­duc­tion said the fol­low­ing:

This sur­vey is in­tended to gauge Wikipe­dia use habits. This sur­vey has 3 pages with 5 ques­tions to­tal (3 on the first page, 1 on the sec­ond page, 1 on the third page). Please try your best to an­swer all of the ques­tions, and make a guess if you’re not sure.

And the ac­tual ques­tions:

  1. How many dis­tinct Wikipe­dia pages do you read per week on av­er­age?

    • less than 1

    • 1 to 10

    • 11 to 25

    • 26 or more

  2. On a search en­g­ine (e.g. Google) re­sults page, do you ex­plic­itly seek Wikipe­dia pages, or do you pas­sively click on Wikipe­dia pages only if they show up at the top of the re­sults?

    • I ex­plic­itly seek Wikipe­dia pages

    • I have a slight prefer­ence for Wikipe­dia pages

    • I just click on what is at the top of the results

  3. Do you usu­ally read a par­tic­u­lar sec­tion of a page or the whole ar­ti­cle?

    • Par­tic­u­lar section

    • Whole page

  4. How of­ten do you do the fol­low­ing? (Choices: Sev­eral times per week, About once per week, About once per month, About once per sev­eral months, Never/​al­most never.)

    • Use the search func­tion­al­ity on Wikipedia

    • Be sur­prised that there is no Wikipe­dia page on a topic

  5. For what frac­tion of pages you read do you do the fol­low­ing? (Choices: For ev­ery page, For most pages, For some pages, For very few pages, Never. Th­ese were dis­played in a ran­dom or­der for each re­spon­dent, but dis­played in alpha­bet­i­cal or­der here.)

    • Check (click or hover over) at least one cita­tion to see where the in­for­ma­tion comes from on a page you are reading

    • Check how many pageviews a page is get­ting (on an ex­ter­nal site or through the Pageview API)

    • Click through/​look for at least one cited source to ver­ify the in­for­ma­tion on a page you are reading

    • Edit a page you are read­ing be­cause of gram­mat­i­cal/​ty­po­graph­i­cal er­rors on the page

    • Edit a page you are read­ing to add new information

    • Look at the “See also” sec­tion for ad­di­tional ar­ti­cles to read

    • Look at the edit­ing his­tory of a page you are reading

    • Look at the edit­ing his­tory solely to see if a par­tic­u­lar user wrote the page

    • Look at the talk page of a page you are reading

    • Read a page mostly for the “Crit­i­cisms” or “Re­cep­tion” (or similar) sec­tion, to un­der­stand differ­ent views on the subject

    • Share the page with a friend/​ac­quain­tance/​coworker

For the Sur­veyMon­key au­di­ence, there were also some de­mo­graphic ques­tions (age, gen­der, house­hold in­come, US re­gion, and de­vice type).

Sur­vey ques­tions for the sec­ond survey

For refer­ence, here are the sur­vey ques­tions for the sec­ond sur­vey. A dummy/​mock-up ver­sion of the sur­vey can be found here: https://​​www.sur­vey­mon­key.com/​​r/​​28BW78V.

The sur­vey in­tro­duc­tion said the fol­low­ing:

This sur­vey is in­tended to gauge Wikipe­dia use habits. Please try your best to an­swer all of the ques­tions, and make a guess if you’re not sure.

This sur­vey has 4 ques­tions across 3 pages.

In this sur­vey, “Wikipe­dia page” refers to a Wikipe­dia page in any lan­guage (not just the English Wikipe­dia).

And the ac­tual ques­tions:

  1. How many dis­tinct Wikipe­dia pages do you read (at least one sen­tence of) per week on av­er­age?

    • Fewer than 1

    • 1 to 10

    • 11 to 25

    • 26 or more

  2. Which of these ar­ti­cles have you read (at least one sen­tence of) on Wikipe­dia (se­lect all that ap­ply)? (Th­ese were dis­played in a ran­dom or­der ex­cept the last op­tion for each re­spon­dent, but dis­played in alpha­bet­i­cal or­der ex­cept the last op­tion here.)

    • Adele

    • Barack Obama

    • Bernie Sanders

    • China

    • Don­ald Trump

    • Google

    • Hillary Clinton

    • India

    • Japan

    • Justin Bieber

    • Justin Trudeau

    • Katy Perry

    • Tay­lor Swift

    • The Beatles

    • United States

    • World War II

    • None of the above

  3. What are some of the Wikipe­dia ar­ti­cles you have most re­cently read (at least one sen­tence of)? Feel free to con­sult your browser’s his­tory.

  4. Re­call a time when you were sur­prised that a topic did not have a Wikipe­dia page. What were some of these top­ics?

Sur­vey ques­tions for the third sur­vey (Google Con­sumer Sur­veys)

This sur­vey had ex­actly one ques­tion. The word­ing of the ques­tion was ex­actly the same as that of the first ques­tion of the sec­ond sur­vey.

  1. How many dis­tinct Wikipe­dia pages do you read (at least one sen­tence of) per week on av­er­age?

    • Fewer than 1

    • 1 to 10

    • 11 to 25

    • 26 or more

One slight differ­ence was that whereas in the sec­ond sur­vey, the or­der of the op­tions was fixed, the third sur­vey did a 5050 split be­tween that or­der and the ex­act re­verse or­der. Such split­ting is a best prac­tice to deal with any or­der-re­lated bi­ases, while still pre­serv­ing the log­i­cal or­der of the op­tions. You can read more on the ques­tion­naire de­sign page of the Pew Re­search Cen­ter.

Results

In this sec­tion we pre­sent the high­lights from each of the sur­vey ques­tions. If you pre­fer to dig into the data your­self, there are also some ex­ported PDFs be­low pro­vided by Sur­veyMon­key. Most of the in­fer­ences can be made us­ing these PDFs, but there are some cases where ad­di­tional filters are needed to de­duce cer­tain per­centages.

We use the no­ta­tion “SnQm” to mean “sur­vey n ques­tion m”.

S1Q1: num­ber of Wikipe­dia pages read per week

Here is a table that sum­ma­rizes the data for Q1:

How many dis­tinct Wikipe­dia pages do you read per week on av­er­age? SM = Sur­veyMon­key au­di­ence, V = Vipul Naik’s timeline, SSC = Slate Star Codex au­di­ence, AM = Wikipe­dia An­a­lyt­ics mailing list.
Re­sponse SM V SSC AM
less than 1 42% 1% 1% 0%
1 to 10 45% 40% 37% 29%
11 to 25 13% 43% 36% 14%
26 or more 0% 16% 27% 57%

Here are some high­lights from the first ques­tion that aren’t ap­par­ent from the table:

  • Of the peo­ple who read fewer than 1 dis­tinct Wikipe­dia page per week (26 peo­ple), 68% were fe­male even though fe­males were only 48% of the re­spon­dents. (Note that gen­der data is only available for the Sur­veyMon­key au­di­ence.)

  • Fil­ter­ing for high house­hold in­come ($150k or more; 11 peo­ple) in the Sur­veyMon­key au­di­ence, only 2 read fewer than 1 page per week, al­though most (7) of the re­sponses still fall in the “1 to 10” cat­e­gory.

The com­ments in­di­cated that this ques­tion was flawed in sev­eral ways: we didn’t spec­ify which lan­guage Wikipe­dias count nor what it meant to “read” an ar­ti­cle (the whole page, a sec­tion, or just a sen­tence?). One com­ment ques­tioned the “low” ceiling of 26; in fact, I had ini­tially made the cut­offs 1, 10, 100, 500, and 1000, but Vipul sug­gested the fi­nal cut­offs be­cause he ar­gued they would make it eas­ier for peo­ple to an­swer (with­out hav­ing to look it up in their browser his­tory). It turned out this mod­ifi­ca­tion was rea­son­able be­cause the “26 or more” group was a minor­ity.

S1Q2: af­finity for Wikipe­dia in search results

We asked Q2, “On a search en­g­ine (e.g. Google) re­sults page, do you ex­plic­itly seek Wikipe­dia pages, or do you pas­sively click on Wikipe­dia pages only if they show up at the top of the re­sults?”, to see to what ex­tent peo­ple preferred Wikipe­dia in search re­sults. The main im­pli­ca­tion to this for peo­ple who do con­tent cre­ation on Wikipe­dia is that if peo­ple do ex­plic­itly seek Wikipe­dia pages (for what­ever rea­son), it makes sense to give them more of what they want. On the other hand, if peo­ple don’t pre­fer Wikipe­dia, it makes sense to up­date in fa­vor of di­ver­sify­ing one’s con­tent cre­ation efforts while still keep­ing in mind that raw pageviews in­di­cate that con­tent will be read more if placed on Wikipe­dia (see for in­stance Brian To­masik’s ex­pe­rience, which is similar to my own, or gw­ern’s page com­par­ing Wikipe­dia with other wikis).

The fol­low­ing table sum­ma­rizes our re­sults.

On a search en­g­ine (e.g. Google) re­sults page, do you ex­plic­itly seek Wikipe­dia pages, or do you pas­sively click on Wikipe­dia pages only if they show up at the top of the re­sults? SM = Sur­veyMon­key au­di­ence, V = Vipul Naik’s timeline, SSC = Slate Star Codex au­di­ence, AM = Wikipe­dia An­a­lyt­ics mailing list, H = heavy users (26 or more ar­ti­cles per week) of Wikipe­dia.
Re­sponse SM V SSC AM H
Ex­plic­itly seek Wikipe­dia 19% 60% 63% 57% 79%
Slight prefer­ence for Wikipe­dia 29% 39% 34% 43% 20%
Just click on top re­sults 52% 1% 3% 0% 1%

One er­ror on my part was that I didn’t in­clude an op­tion for peo­ple who avoided Wikipe­dia or did some­thing else. This be­came ap­par­ent from the com­ments. For this rea­son, the “Just click on top re­sults” op­tions might be in­flated. In ad­di­tion, some com­ments in­di­cated a mixed strat­egy of prefer­ring Wikipe­dia for gen­eral overviews while avoid­ing it for spe­cific in­quiries, so al­low­ing mul­ti­ple se­lec­tions might have been bet­ter for this ques­tion.

S1Q3: sec­tion vs whole page

This ques­tion is rele­vant for Vipul and me be­cause the work Vipul funds is mainly whole-page cre­ation. If peo­ple are mostly read­ing the in­tro­duc­tion or a par­tic­u­lar sec­tion like the “Crit­i­cisms” or “Re­cep­tion” sec­tion (see S1Q5), then that forces us to con­sider spend­ing more time on those sec­tions, or to strengthen those sec­tions on weak ex­ist­ing pages.

Re­sponses to this ques­tion were fairly con­sis­tent across differ­ent au­di­ences, as can be see in the fol­low­ing table.

Do you usu­ally read a par­tic­u­lar sec­tion of a page or the whole ar­ti­cle? SM = Sur­veyMon­key au­di­ence, V = Vipul Naik’s timeline, SSC = Slate Star Codex au­di­ence, AM = Wikipe­dia An­a­lyt­ics mailing list.
Re­sponse SM V SSC AM
Sec­tion 73% 80% 74% 86%
Whole 34% 23% 33% 29%

Note that peo­ple were al­lowed to se­lect more than one op­tion for this ques­tion. The com­ments in­di­cate that sev­eral peo­ple do a com­bi­na­tion, where they read the in­tro­duc­tory por­tion of an ar­ti­cle, then nar­row down to the sec­tion of their in­ter­est.

S1Q4: search func­tion­al­ity on Wikipe­dia and sur­prise at lack of Wikipe­dia pages

We asked about whether peo­ple use the search func­tion­al­ity on Wikipe­dia be­cause we wanted to know more about peo­ple’s ar­ti­cle dis­cov­ery meth­ods. The data is sum­ma­rized in the fol­low­ing table.

How of­ten do you use the search func­tion­al­ity on Wikipe­dia? SM = Sur­veyMon­key au­di­ence, V = Vipul Naik’s timeline, SSC = Slate Star Codex au­di­ence, AM = Wikipe­dia An­a­lyt­ics mailing list, H = heavy users (26 or more ar­ti­cles per week) of Wikipe­dia.
Re­sponse SM V SSC AM H
Sev­eral times per week 8% 14% 32% 57% 55%
About once per week 19% 17% 21% 14% 15%
About once per month 15% 13% 14% 0% 3%
About once per sev­eral months 13% 12% 9% 14% 5%
Never/​al­most never 45% 43% 24% 14% 23%

Many peo­ple noted here that rather than us­ing Wikipe­dia’s search func­tion­al­ity, they use Google with “wiki” at­tached to their query, Duck­DuckGo’s “!w” ex­pres­sion, or some browser con­figu­ra­tion to al­low a quick search on Wikipe­dia.

To be more thor­ough about dis­cov­er­ing peo­ple’s con­tent dis­cov­ery meth­ods, we should have asked about other meth­ods as well. We did ask about the “See also” sec­tion in S1Q5.

Next, we asked how of­ten peo­ple are sur­prised that there is no Wikipe­dia page on a topic to gauge to what ex­tent peo­ple no­tice a “gap” be­tween how Wikipe­dia ex­ists to­day and how it could ex­ist. We were cu­ri­ous about what ar­ti­cles peo­ple speci­fi­cally found miss­ing, so we fol­lowed up with S2Q4.

How of­ten are you sur­prised that there is no Wikipe­dia page on a topic? SM = Sur­veyMon­key au­di­ence, V = Vipul Naik’s timeline, SSC = Slate Star Codex au­di­ence, AM = Wikipe­dia An­a­lyt­ics mailing list, H = heavy users (26 or more ar­ti­cles per week) of Wikipe­dia.
Re­sponse SM V SSC AM H
Sev­eral times per week 2% 0% 2% 29% 6%
About once per week 8% 22% 18% 14% 34%
About once per month 18% 36% 34% 29% 31%
About once per sev­eral months 21% 22% 27% 0% 19%
Never/​al­most never 52% 20% 19% 29% 10%

Two com­ments on this ques­tion (out of 59) – both from the SSC group – speci­fi­cally be­moaned dele­tion­ism, with one com­ment call­ing dele­tion­ism “a can­cer kil­ling Wikipe­dia”.

S1Q5: be­hav­ior on pages

This ques­tion was in­tended to gauge how of­ten peo­ple perform an ac­tion for a spe­cific page; as such, the fre­quen­cies are ex­pressed in page-rel­a­tive terms.

The fol­low­ing table pre­sents the scores for each re­sponse, which are weighted by the num­ber of re­sponses. The scores range from 1 (for ev­ery page) to 5 (never); in other words, the lower the num­ber, the more fre­quently one does the thing.

For what frac­tion of pages you read do you do the fol­low­ing? Note that the re­sponses have been short­ened here; see the “Sur­vey ques­tions” sec­tion for the word­ing used in the sur­vey. Re­sponses are sorted by the val­ues in the SSC column. SM = Sur­veyMon­key au­di­ence, V = Vipul Naik’s timeline, SSC = Slate Star Codex au­di­ence, AM = Wikipe­dia An­a­lyt­ics mailing list, H = heavy users (26 or more ar­ti­cles per week) of Wikipe­dia.
Re­sponse SM V SSC AM H
Check ≥1 cita­tion 3.57 2.80 2.91 2.67 2.69
Look at “See also” 3.65 2.93 2.92 2.67 2.76
Read mostly for “Crit­i­cisms” or “Re­cep­tion” 4.35 3.12 3.34 3.83 3.14
Click through ≥1 source to ver­ify in­for­ma­tion 3.80 3.07 3.47 3.17 3.36
Share the page 4.11 3.72 3.86 3.67 3.79
Look at the talk page 4.31 4.28 4.03 3.00 3.86
Look at the edit­ing his­tory 4.35 4.32 4.12 3.33 3.92
Edit a page for gram­mat­i­cal/​ty­po­graph­i­cal er­rors 4.50 4.41 4.22 3.67 4.02
Edit a page to add new in­for­ma­tion 4.61 4.55 4.49 3.83 4.34
Look at edit­ing his­tory to ver­ify au­thor 4.50 4.65 4.48 3.67 4.73
Check how many pageviews a page is get­ting 4.63 4.88 4.96 3.17 4.92

The table above pro­vides a good rank­ing of how of­ten peo­ple perform these ac­tions on pages, but not the dis­tri­bu­tion in­for­ma­tion (which would re­quire three di­men­sions to pre­sent fully). In gen­eral, the more com­mon ac­tions (scores of 2.5–4) had re­sponses that clus­tered among “For some pages”, “For very few pages”, and “Never”, while the less com­mon ac­tions (scores above 4) had re­sponses that clus­tered mainly in “Never”.

One com­ment (out of 43) – from the SSC group, but a differ­ent in­di­vi­d­ual from the two in S1Q4 – be­moaned dele­tion­ism.

S2Q1: num­ber of Wikipe­dia pages read per week

Note the word­ing changes on this ques­tion for the sec­ond sur­vey: “less” was changed to “fewer”, the clar­ifi­ca­tion “at least one sen­tence of” was added, and we ex­plic­itly al­lowed any lan­guage. We have also pre­sented the sur­vey 1 re­sults for the Sur­veyMon­key au­di­ence in the cor­re­spond­ing rows, but note that be­cause of the change in word­ing, the cor­re­spon­dence isn’t ex­act.

How many dis­tinct Wikipe­dia pages do you read (at least one sen­tence of) per week on av­er­age? SM = Sur­veyMon­key au­di­ence with no de­mo­graphic filters, CEYP = Col­lege-ed­u­cated young peo­ple of Sur­veyMon­key, S1SM = Sur­veyMon­key au­di­ence with no de­mo­graphic filters from the first sur­vey.
Re­sponse SM CEYP S1SM
Fewer than 1 37% 32% 42%
1 to 10 48% 64% 45%
11 to 25 7% 2% 13%
26 or more 7% 2% 0%

Com­par­ing SM with S1SM, we see that prob­a­bly be­cause of the word­ing, the per­centages have drifted in the di­rec­tion of more pages read. It might be sur­pris­ing that the young ed­u­cated au­di­ence seems to have a smaller frac­tion of heavy users than the gen­eral pop­u­la­tion. How­ever note that each group only had ~50 re­sponses, and that we have no ed­u­ca­tion in­for­ma­tion for the SM group.

S2Q2: mul­ti­ple-choice of ar­ti­cles read

Our in­ten­tion with this ques­tion was to see if peo­ple’s stated or re­called ar­ti­cle fre­quen­cies matched the ac­tual, re­vealed pop­u­lar­ity of the ar­ti­cles. There­fore we pre­sent the pageview data along with the per­centage of peo­ple who said they had read an ar­ti­cle.

Which of these ar­ti­cles have you read (at least one sen­tence of) on Wikipe­dia (se­lect all that ap­ply)? SM = Sur­veyMon­key au­di­ence with no de­mo­graphic filters, CEYP = Col­lege-ed­u­cated young peo­ple of Sur­veyMon­key. Columns “2016” and “2015” are desk­top pageviews in mil­lions. Note that the 2016 pageviews only in­clude pageviews through the end of June. The rows are sorted by the val­ues in the CEYP column fol­lowed by those in the SM column.
Re­sponse SM CEYP 2016 2015
None 37% 40%
World War II 17% 22% 2.6 6.5
Barack Obama 17% 20% 3.0 7.7
United States 17% 18% 4.3 9.6
Don­ald Trump 15% 18% 14.0 6.6
Tay­lor Swift 9% 18% 1.7 5.3
Bernie San­ders 17% 16% 4.3 3.8
Ja­pan 11% 16% 1.6 3.7
Adele 6% 16% 2.0 4.0
Hillary Clin­ton 19% 14% 2.8 1.5
China 13% 14% 1.9 5.2
The Bea­tles 11% 14% 1.4 3.0
Katy Perry 9% 12% 0.8 2.4
Google 15% 10% 3.0 9.0
In­dia 13% 10% 2.4 6.4
Justin Bie­ber 4% 8% 1.6 3.0
Justin Trudeau 9% 6% 1.1 3.0

Below are four plots of the data. Note that r_s de­notes Spear­man’s rank cor­re­la­tion co­effi­cient. Spear­man’s rank cor­re­la­tion co­effi­cient is used in­stead of Pear­son’s r be­cause the former is less af­fected by out­liers. Note also that the per­centage of re­spon­dents who viewed a page counts each re­spon­dent once, whereas the num­ber of pageviews does not have this re­stric­tion (i.e. du­pli­cate pageviews count), so we wouldn’t ex­pect the re­la­tion­ship to be en­tirely lin­ear even if the sur­vey au­di­ences were perfectly rep­re­sen­ta­tive of the gen­eral pop­u­la­tion.

SM vs 2016 pageviews

SM vs 2016 pageviews

SM vs 2015 pageviews

SM vs 2015 pageviews

CEYP vs 2016 pageviews

CEYP vs 2016 pageviews

CEYP vs 2015 pageviews

CEYP vs 2015 pageviews

S2Q3: free re­sponse of ar­ti­cles read

The most com­mon re­sponse was along the lines of “None”, “I don’t know”, “I don’t re­mem­ber”, or similar. Among the more use­ful re­sponses were:

S2Q4: free re­sponse of sur­prise at lack of Wikipe­dia pages

As with the pre­vi­ous ques­tion, the most com­mon re­sponse was along the lines of “None”, “I don’t know”, “I don’t re­mem­ber”, “Doesn’t hap­pen”, or similar.

The most use­ful re­sponses were classes of things: “par­tic­u­lar words”, “French plays/​books”, “Ran­dom peo­ple”, “ob­scure peo­ple”, “Spe­cific list pages of movie gen­res”, “For­eign ac­tors”, “var­i­ous in­sect pages”, and so forth.

S3Q1 (Google Con­sumer Sur­veys)

The sur­vey was cir­cu­lated to a tar­get size of 500 in the United States (no de­mo­graphic filters), and re­ceived 501 re­sponses.

Since there was only one ques­tion, but we ob­tained data filtered by de­mo­graph­ics in many differ­ent ways, we pre­sent this table with the columns de­not­ing re­sponses and the rows de­not­ing the au­di­ence seg­ments. We also in­clude the S1Q1SM, S2Q1SM, and S2Q1CEYP re­sponses for easy com­par­i­son. Note that S1Q1SM did not in­clude the “at least one sen­tence of” caveat. We be­lieve that adding this caveat would push peo­ple’s es­ti­mates up­ward.

If you view the Google Con­sumer Sur­veys re­sults on­line you will also see the 95% con­fi­dence in­ter­vals for each of the seg­ments. Note that per­centages in a row may not add up to 100% due to round­ing or due to peo­ple en­ter­ing “Other” re­sponses. For the en­tire GCS au­di­ence, ev­ery pair of op­tions had a statis­ti­cally sig­nifi­cant differ­ence, but for some sub­seg­ments, this was not true.

Au­di­ence seg­ment Fewer than 1 1 to 10 11 to 25 26 or more
S1Q1SM (N = 62) 42% 45% 13% 0%
S2Q1SM (N = 54) 37% 48% 7% 7%
S2Q1CEYP (N = 50) 32% 64% 2% 2%
GCS all (N = 501) 47% 35% 12% 6%
GCS male (N = 205) 41% 38% 16% 5%
GCS fe­male (N = 208) 52% 34% 10% 5%
GCS 18–24 (N = 54) 33% 46% 13% 7%
GCS 25–34 (N = 71) 41% 37% 16% 7%
GCS 35–44 (N = 69) 51% 35% 10% 4%
GCS 45–54 (N = 77) 46% 40% 12% 3%
GCS 55–64 (N = 69) 57% 32% 7% 4%
GCS 65+ (N = 50) 52% 24% 18% 4%
GCS Ur­ban (N = 176) 44% 35% 14% 7%
GCS Subur­ban (N = 224) 50% 34% 10% 6%
GCS Ru­ral (N = 86) 44% 35% 14% 6%
GCS $0–24K (N = 49) 41% 37% 16% 6%
GCS $25–49K (N = 253) 53% 30% 10% 6%
GCS $50–74K (N = 132) 42% 39% 13% 6%
GCS $75–99K (N = 37) 43% 35% 11% 11%
GCS $100–149K (N = 11) 9% 64% 18% 9%
GCS $150K+ (N = 4) 25% 75% 0% 0%

We can see that the over­all GCS data vin­di­cates the broad con­clu­sions we drew from Sur­veyMon­key data. More­over, most GCS seg­ments with a suffi­ciently large num­ber of re­sponses (50 or more) dis­play a similar trend as the over­all data. One ex­cep­tion is that younger au­di­ences seem to be slightly less likely to use Wikipe­dia very lit­tle (i.e. fall in the “Fewer than 1” cat­e­gory), and older au­di­ences seem slightly more likely to use Wikipe­dia very lit­tle.

Sur­veyMon­key al­lows ex­port­ing of re­sponse sum­maries. Here are the ex­ports for each of the au­di­ences.

The Google Con­sumer Sur­veys sur­vey re­sults are available on­line at https://​​www.google.com/​​in­sights/​​con­sumer­sur­veys/​​view?sur­vey=o3iworx2rcfixmn2x5shtlp­pci&ques­tion=1&filter=&rw=1.

Sur­vey-mak­ing lessons

Not hav­ing any ex­pe­rience de­sign­ing sur­veys, and want­ing some rough re­sults quickly, I de­cided not to look into sur­vey-mak­ing best prac­tices be­yond the feed­back from Vipul. As the first sur­vey pro­gressed, it be­came clear that there were sev­eral defi­cien­cies in that sur­vey:

  • Ques­tion 1 did not spec­ify what counts as read­ing a page.

  • We did not spec­ify which lan­guage Wikipe­dias we were con­sid­er­ing (mul­ti­ple peo­ple noted how they read other lan­guage Wikipe­dias other than the English Wikipe­dia).

  • Ques­tion 2 did not in­clude an op­tion for peo­ple who avoid Wikipe­dia or do some­thing else en­tirely.

  • We did not in­clude an op­tion to al­low peo­ple to re­lease their sur­vey re­sults.

Fur­ther questions

The two sur­veys we’ve done so far provide some in­sight into how peo­ple use Wikipe­dia, but we are still far from un­der­stand­ing the value of Wikipe­dia pageviews. Some re­main­ing ques­tions:

  • Could it be pos­si­ble that even on non-ob­scure top­ics, most of the views are by “elites” (i.e. those with out­sized im­pact on the world)? This could mean pageviews are more valuable than pre­vi­ously thought.

  • On S2Q1, why did our data show that CEYP was less en­gaged with Wikipe­dia than SM? Is this a limi­ta­tion of the small num­ber of re­sponses or of Sur­veyMon­key’s au­di­ences?

Fur­ther reading

Acknowledgements

Thanks to Vipul Naik for col­lab­o­ra­tion on this pro­ject and feed­back while writ­ing this doc­u­ment, and for sup­ply­ing the sum­mary sec­tion, and thanks to Ethan Bashkan­sky for re­view­ing the doc­u­ment. All im­perfec­tions are my own.

The writ­ing of this doc­u­ment was spon­sored by Vipul Naik. Vipul Naik also paid Sur­veyMon­key (for the cost of Sur­veyMon­key Au­di­ence) and Google Con­sumer Sur­veys.

Doc­u­ment source and versions

The source files used to com­pile this doc­u­ment are available in a GitHub Gist. The Git repos­i­tory of the Gist con­tains all ver­sions of this doc­u­ment since its first pub­li­ca­tion.

This doc­u­ment is available in the fol­low­ing for­mats:

License

This doc­u­ment is re­leased to the pub­lic do­main.