2018 Review: Voting Results!

Ben Pace24 Jan 2020 2:00 UTC

135 points

LessWrong Review Site Meta Epistemic Review Community

The votes are in!

59 of the 430 eligible voters participated, evaluating 75 posts. Meanwhile, 39 users submitted a total of 120 reviews, with most posts getting at least one review.

Thanks a ton to everyone who put in time to think about the posts—nominators, reviewers and voters alike. Several reviews substantially changed my mind about many topics and ideas, and I was quite grateful for the authors participating in the process. I’ll mention Zack_M_Davis, Vanessa Kosoy, and Daniel Filan as great people who wrote the most upvoted reviews.

In the coming months, the LessWrong team will write further analyses of the vote data, and use the information to form a sequence and a book of the best writing on LessWrong from 2018.

Below are the results of the vote, followed by a discussion of how reliable the result is and plans for the future.

The Complete Results

Click Here If You Would Like A More Comprehensive Vote Data Spreadsheet

To help users see the spread of the vote data, we’ve included swarmplot visualizations.

For space reasons, only votes with weights between −10 and 16 are plotted. This covers 99.4% of votes.
Gridlines are spaced 2 points apart.
Concrete illustration: The plot immediately below has 18 votes ranging in strength from −3 to 12.

#	Post Title	Total	Vote Spread
1	Embedded Agents	209	(One outlier vote of +17 is not shown)
2	The Rocket Alignment Problem	183
3	Local Validity as a Key to Sanity and Civilization	133
4	Arguments about fast takeoff	98
5	The Costly Coordination Mechanism of Common Knowledge	95
6	Toward a New Technical Explanation of Technical Explanation	91
7	Anti-social Punishment	90	(One outlier vote of +20 is not shown)
8	The Tails Coming Apart As Metaphor For Life	89
9	Babble	85
10	The Loudest Alarm Is Probably False	84
11	The Intelligent Social Web	79
12	Prediction Markets: When Do They Work?	77
13	Coherence arguments do not imply goal-directed behavior	76
14	Is Science Slowing Down?	75
15	Robustness to Scale	74
15	A voting theory primer for rationalists	74
17	Toolbox-thinking and Law-thinking	73
18	A Sketch of Good Communication	72
19	A LessWrong Crypto Autopsy	71
20	Paul’s research agenda FAQ	70
21	Unrolling social metacognition: Three levels of meta are not enough.	69
22	An Untrollable Mathematician Illustrated	65
23	Specification gaming examples in AI	64
23	Will AI See Sudden Progress?	64
23	Varieties Of Argumentative Experience	64
26	Meta-Honesty: Firming Up Honesty Around Its Edge-Cases	62
27	My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms	60
27	Naming the Nameless	60
27	Inadequate Equilibria vs. Governance of the Commons	60
30	2018 AI Alignment Literature Review and Charity Comparison	57
31	Noticing the Taste of Lotus	55
31	On Doing the Improbable	55
31	The Pavlov Strategy	55
31	Being a Robust, Coherent Agent (V2)	55
35	Spaghetti Towers	54
36	Beyond Astronomical Waste	51
36	Research: Rescuers during the Holocaust	51
38	Open question: are minimal circuits daemon-free?	48
38	Decoupling vs Contextualising Norms	48	(One outlier vote of +23)
40	On the Loss and Preservation of Knowledge	47
41	Is Clickbait Destroying Our General Intelligence?	46
42	What makes people intellectually active?	43
43	Why everything might have taken so long	40
44	Challenges to Christiano’s capability amplification proposal	39
45	Public Positions and Private Guts	38
46	Clarifying “AI Alignment”	36
46	Expressive Vocabulary	36
48	Bottle Caps Aren’t Optimisers	34
49	*Argue Politics With Your Best Friends**	32
50	Player vs. Character: A Two-Level Model of Ethics	30
51	Conversational Cultures: Combat vs Nurture (V2)	29
51	Act of Charity	29
53	Optimization Amplifies	27
53	Circling	27	(One outlier vote of −17)
55	Realism about rationality	25	(Two outliers of −30 and +18)
55	Caring less	25
57	Lessons from the Cold War on Information Hazards: Why Internal Communication is Critical	24
57	The Bat and Ball Problem Revisited	24
59	Argument, intuition, and recursion	21
59	Unknown Knowns	21
61	Competitive Markets as Distributed Backprop	18
62	Towards a New Impact Measure	14
62	Explicit and Implicit Communication	14
62	On the Chatham House Rule	14
62	Historical mathematicians exhibit a birth order effect too	14
66	Everything I ever needed to know, I learned from World of Warcraft: Goodhart’s law	13
67	The funnel of human experience	11
68	Understanding is translation	9
69	Preliminary thoughts on moral weight	7
70	Metaphilosophical competence can’t be disentangled from alignment	3
71	Two types of mathematician	2
72	How did academia ensure papers were correct in the early 20th Century?	-2
73	Birth order effect found in Nobel Laureates in Physics	-5
74	Give praise	-10
75	Affordance Widths	-142	(One outlier of −29)

How reliable is the output of this vote?

For most posts, between 10-20 people voted on them (median of 17). A change by 10-15 in a post’s score is enough to move a post up or down around 10 positions within the rankings. This is equal to a few moderate strength votes from two or three people, or an exceedingly strong vote from a single strongly-feeling voter. This means that the system is somewhat noisy, though it seems to me very unlikely that posts at the very top could end up placed much differently.

The vote was also affected by two technical mistakes the team made:

The post-order was not randomized. For the first half of the voting period, the posts on the voting page appeared in order of number of nominations (least to most) instead of appearing randomly, thereby giving more visual attention to the first ~15 or so posts (these were posts with 2 nominations). Ruby looked into it and says that 15-30% more people cast votes on these earlier-appearing posts compared to those appearing elsewhere in the list. Thanks to gjm for identifying this issue.
Users were given some free negative votes. When calculating the cost of users’ votes, we used a simple equation, but missed that it produced an off-by-one error for negative numbers. Essentially, users got a free 1-negative-vote-weight on all the posts to which they had voted on negatively. To correct for this, for those who had exceeded their budget − 18 users in total—we reduced the strength of their negative votes by a single unit, and for those who had not spent all their points their votes were unaffected. This didn’t affect the rank-ordering very much, a few posts changed by 1 position, and a smaller number changed by 2-3 positions.

The effect size of these errors is not certain since it’s hard to know how people would have voted counterfactually. My sense is that the effect is pretty small, and that the majority of noise in the system comes from elsewhere.

Finally, we discarded exactly one ballot, which spent 10,000 points on voting instead of the allotted 500. Had a user gone over by a small amount e.g. 1-50 points, we had planned to scale their votes down to fit the budget. However when someone’s allocation was so extreme, we were honestly unsure what adjustment to their votes they would have wanted, as if their points had been normalised down to 500, the majority of their votes would have been adjusted to zero. (This decision was made without knowing the user who cast the ballot or which posts were affected.)

Overall, I think the vote is a good indicator to about 10 places within the rankings, but, for example, I wouldn’t agonise over whether a post is at position #42 vs #43.

The Future

This has been the first LessWrong Annual Review. This project was started with the vision of creating a piece of infrastructure that would:

Create common knowledge about how the LessWrong community feels about various posts and topics and the progress we’ve made.
Improve our longterm incentives, feedback, and rewards for authors.
Help create a highly curated “Best of 2018” Sequence and Book.

The vote reveals much disagreement between LessWrongers. Every post has at least five positive votes and every post had at least one negative vote – except for An Untrollable Mathematician Illustrated by Abram Demski, which was evidently just too likeable – and many people had strongly different feelings about many posts. Many of these seem more interesting to me than the specific ranking of the given post.

In total, users wrote 207 nominations and 120 reviews, and many authors updated their posts with new thinking, or clearer explanations, showing that both readers and authors reflected a lot (and I think changed their mind a lot) during the review period. I think all of this is great, and like the idea of us having a Schelling time in the year for this sort of thinking.

Speaking for myself, this has been a fascinating and successful experiment—I’ve learned a lot. My thanks to Ray for pushing me and the rest of the team to actually do it this year, in a move-fast-and-break-things kind of way. The team will be conducting a Review of the Review where we take stock of what happened, discuss the value and costs of the Review process, and think about how to make the review process more effective and efficient in future years.

In the coming months, the LessWrong team will write further analyses of the vote data, award prizes to authors and reviewers, and use the vote to help design a sequence and a book of the best writing on LW from 2018.

I think it’s awesome that we can do things like this, and I was honestly surprised by the level of community participation. Thanks to everyone who helped out in the LessWrong 2018 Review—everyone who nominated, reviewed, voted and wrote the posts.

What links here?

Ben Pace24 Jan 2020 2:00 UTC

135 points

59 comments6 min readLW link

LessWrong Review Site Meta Epistemic Review Community

Ruby 24 Jan 2020 2:13 UTC

24 points

Bounty offered for Analysis of the Results

I’m offering a pool of $100+ of my personal money for the best analyses of the results, as judged by me. I’m looking for things that are meaningful insights drawn from the data, e.g. modeling the interaction between the karma score of a post and its vote outcomes.

There are a number of aggregate stats for each post included in the linked spreadsheet, but I’m also open to making available further stats or data to people upon request so long as they keep the voters anonymous.

EDIT: Be creative in what analyses you might run and don’t limit yourself to just what’s the in spreadsheet. As above, I’ll share more data if it seems appropriate. This might be data about posts, comments, and anything else to do with the site.

DanielFilan 24 Jan 2020 17:52 UTC

18 points

Parent

One thing I’d be interested to learn is whether alignment forum participants’ votes would have produced the same top 15 AI posts, but I don’t want to run that analysis myself.

Raemon 24 Jan 2020 23:51 UTC

4 points

Parent

Yup, we’ll have some info about that soon.

Raemon 28 Jan 2020 2:33 UTC

19 points

Parent

Here are the AF-user results (including both AF and non-AF posts)

Embedded Agents	101
The Rocket Alignment Problem	81
Local Validity as a Key to Sanity and Civilization	61
Arguments about fast takeoff	60
Coherence arguments do not imply goal-directed behavior	53
Toward a New Technical Explanation of Technical Explanation	43
The Tails Coming Apart As Metaphor For Life	43
Will AI See Sudden Progress?	41
Robustness to Scale	40
Open question: are minimal circuits daemon-free?	38
Paul’s research agenda FAQ	36
Specification gaming examples in AI	34
Clarifying “AI Alignment”	31
Anti-social Punishment	27
Babble	25
The Intelligent Social Web	24
Is Science Slowing Down?	22
Research: Rescuers during the Holocaust	21
An Untrollable Mathematician Illustrated	20
Why everything might have taken so long	20
The Costly Coordination Mechanism of Common Knowledge	19
Bottle Caps Aren’t Optimisers	19
The Loudest Alarm Is Probably False	18
Prediction Markets: When Do They Work?	18
A voting theory primer for rationalists	18
Unrolling social metacognition: Three levels of meta are not enough.	18
On Doing the Improbable	18
Toolbox-thinking and Law-thinking	17
Optimization Amplifies	17
2018 AI Alignment Literature Review and Charity Comparison	16
A Sketch of Good Communication	15
A LessWrong Crypto Autopsy	14
Spaghetti Towers	13
Argument, intuition, and recursion	13
Understanding is translation	13
Inadequate Equilibria vs. Governance of the Commons	12
The Pavlov Strategy	12
Challenges to Christiano’s capability amplification proposal	12
Varieties Of Argumentative Experience	11
Beyond Astronomical Waste	11
What makes people intellectually active?	11
My attempt to explain Looking, insight meditation, and enlightenment in non-mysterious terms	10
Naming the Nameless	10
Act of Charity	10
Unknown Knowns	10
Meta-Honesty: Firming Up Honesty Around Its Edge-Cases	9
Conversational Cultures: Combat vs Nurture (V2)	9
Realism about rationality	9
Preliminary thoughts on moral weight	9
*Argue Politics With Your Best Friends**	8
Player vs. Character: A Two-Level Model of Ethics	8
Towards a New Impact Measure	8
On the Loss and Preservation of Knowledge	7
Caring less	7
On the Chatham House Rule	7
Circling	5
Lessons from the Cold War on Information Hazards: Why Internal Communication is Critical	5
Historical mathematicians exhibit a birth order effect too	5
Decoupling vs Contextualising Norms	4
Is Clickbait Destroying Our General Intelligence?	4
The Bat and Ball Problem Revisited	4
Being a Robust, Coherent Agent (V2)	3
Noticing the Taste of Lotus	1
The funnel of human experience	1
Competitive Markets as Distributed Backprop	0
Public Positions and Private Guts	-1
Expressive Vocabulary	-1
Everything I ever needed to know, I learned from World of Warcraft: Goodhart’s law	-1
Explicit and Implicit Communication	-3
Give praise	-3
Birth order effect found in Nobel Laureates in Physics	-4
How did academia ensure papers were correct in the early 20th Century?	-5
Metaphilosophical competence can’t be disentangled from alignment	-6
Two types of mathematician	-6
Affordance Widths	-47

Raemon 28 Jan 2020 2:45 UTC
2 points
0
Parent
One of the more noteworthy-bits here, I think, is that “Are Minimal Circuits Daemon Free?” not only moves up relative to other AF posts, but is in the top-10 overall.

Matt Goldenberg 25 Jan 2020 15:23 UTC
15 points
0
Parent
I’d love to see a cluster analysis. I suspect there will be some obvious groups like “AI alignment vs. rationality”, but also could give some interesting data on things like if “rationality vs. postrationality” is a real split or “Combat vs. Nurture” etc.
In that same vein, I’d also be really interested to see the vote results if pairwise-bounded quadratic voting was used. Intuitively to me this feels like an interesting attempt to look at “debiasing” the votes away from factions, and looking more at what the non-factional consensus is.

gjm 24 Jan 2020 13:09 UTC
15 points
0
So one user spent 465 of their 500 available votes to downvote “Realism about Rationality”.
I wonder whether that reflects exceptionally strong dislike of that post, or whether it means that they voted “No” on that and nothing on anything else, and then the −30 is just what the quadratic-vote-allocator turned that into.
I suspect the latter, and further suspect that whoever it was might not have wanted their vote interpreted quite that way. (Not with much confidence in either case.)
If a similar system is used on future occasions, it might be a good idea to limit how strong votes are made for users who don’t cast many votes. Of course you should be able to spend your whole budget on downvoting one thing you really hate, but you should have to do it deliberately and consciously.
- Ruby 24 Jan 2020 16:48 UTC
  12 points
  0
  Parent
  If a similar system is used on future occasions, it might be a good idea to limit how strong votes are made for users who don’t cast many votes.
  The quadratic-vote-allocator’s multiplier of non-quadratic votes was capped at a multiplier of 6x. A “No” vote starts out with a cost −4, so even if you only voted “No” on one item, it wouldn’t become more than a cost of 24 which translates into a vote with weight −6.
  I’d say the −30 was intentional.
  - habryka 24 Jan 2020 18:00 UTC
    8 points
    0
    Parent
    Yep, we considered this case, and so intentionally capped how much quadratic vote weight a single qualitative vote can translate to. So I am quite confident that this was intentional.
    - gjm 24 Jan 2020 18:05 UTC
      4 points
      0
      Parent
      Ah, OK. I’m convinced :-).
- Zvi 24 Jan 2020 14:47 UTC
  5 points
  0
  Parent
  Easy to make it possible with explicit quadratic voting but not interpret people who use non-quadratic voting to cast only 1-3 votes in this way.
ozziegooen 24 Jan 2020 10:40 UTC
13 points
0
I’m quite curious how this ordering correlated with the original LessWrong Karma of each post, if that analysis hasn’t been done yet. Perhaps I’d be more curious to better understand what a great ordering would be. I feel like there are multiple factors taken into account when voting, and it’s also quite possible that the userbase represents multiple clusters that would have distinct preferences.
- Kaj_Sotala 24 Jan 2020 11:33 UTC
  18 points
  0
  Parent
  The “Click Here If You Would Like A More Comprehensive Vote Data Spreadsheet” link includes both vote totals and karma, making it easy to calculate the correlation using Google Sheet’s CORRELATE function. Pearson correlation between karma and vote count is 0.355, or if we throw away the outlier of Affordance Widths that was heavily downvoted due to its author, 0.425.
  Scatterplots with “Affordance Widths” removed:
  - Unnamed 26 Jan 2020 21:55 UTC
    8 points
    0
    Parent
    Pearson correlation between karma and vote count is 0.355
    And it’s even larger (r = −0.46) between amount of karma and ranking in the vote.
    - Kaj_Sotala 27 Jan 2020 9:29 UTC
      11 points
      0
      Parent
      That’s an interesting measure, let’s plot that too. (Ranks reversed so that rank 1 is represented as 74, rank 2 as 73, and so on.)
  - ozziegooen 24 Jan 2020 11:51 UTC
    2 points
    0
    Parent
    Interesting. From the data, it looks like there’s a decent linear correlation up to around 150 Karma or so, and then after that the correlation looks more nebulous.
    - Bird Concept 24 Jan 2020 15:56 UTC
      10 points
      0
      Parent
      That seems like weak evidence of karma info-cascades: posts with more karma get more upvotes *simply because* they have more karma, in a way which ultimately doesn’t correlate with their “true value” (as measured by the review process).
      Potential mediating causes include users being anchored by karma, or more karma causing a larger share of the attention of the userbase (due to various sorting algorithms).
      - Ben Pace 24 Jan 2020 17:48 UTC
        7 points
        0
        Parent
        It was 1st June 2018 that we built strong/weak upvotes—before then you had to always vote your max strength. I could imagine that being responsible for the apparent info-cascades in very popular post.
      - Ruby 24 Jan 2020 17:07 UTC
        6 points
        0
        Parent
        If voters are at all consistent, you’d expect at lease some positive correlation because the same factors that made them upvote for karma also made upvote for the Review.
        Beyond that, I’m guessing people voted for the posts they’d read, and people would have read higher karma posts more often since they get more exposure, e.g. sticking around the Latest Posts list for longer.
    - orthonormal 26 Jan 2020 19:53 UTC
      2 points
      0
      Parent
      Correlation looks good all the way except for the three massive outliers at the top of the rankings.
Bucky 24 Jan 2020 9:12 UTC
13 points
0
I just realised my own voting (and I suspect that of most people) was inefficient.
Once I decided on all of my votes I should have decreased all of the votes by 1, including putting a −1 vote on any that I had previously been neutral on (ignoring the off-by-one error thing for the moment).
This wouldn’t have changed the net effect of my vote but would have given me extra points to spend (the small cost of paying for negative results would have been more than offset by the large benefit from decreasing the positive votes).
I think most other people made the same mistake (well, it’s a mistake if voting effect size was a high priority rather than, say, speed) due to the large number of neutral votes (~71%) and the ratio between positive vs negative votes (4.8 : 1) although both of these might have been effected somewhat by the off-by-one correction.
- Jameson Quinn 24 Jan 2020 21:05 UTC
  28 points
  0
  Parent
  It’s far worse than that.
  Even without the off-by-one bug, if you were able to vote on all the nominees, then the most efficient use of points would have been to make your average vote as close to 0 as possible. For instance, imagine there were 50 nominees, and one voter rated 5 of them at 10 to use their 500 points. If instead, they rated each of those 5 at 9, and the other 45 at −1, that would be 405+45=450 points, with exactly the same overall impact on relative standings. (In algebraic terms, this is a simple quadratic decomposition, akin to the fact that mean squared error equals bias squared plus variance).
  It’s clear from the vote results that the average voter did not ensure mean-0 and thus probably left some voting power on the table. The average of all votes is substantially positive.
  This is even more “irrational” for those voters who were also authors of one or more of the works. Such voters were not allowed to vote on their own works, but could have helped their works win by voting all other works down. In other words, the incentive would be to have their average vote be negative, not just zer0.
  I myself subtracted 2 from all my default-calculated votes for the above two reasons. Frankly, this 2-point difference (reduced to 1 in many cases by the off-by-one bug correction) was less than I considered selfishly rational, but I didn’t want to take too much advantage of such “underhanded” strategy. Looking at the voting results, I don’t think it’s likely that anybody else but me did this.
  - habryka 24 Jan 2020 21:49 UTC
    17 points
    0
    Parent
    I wonder whether there is a way to take someone’s vote and infer a more optimal allocation of the votes, and then scale that up to use the full available points, so that we could potentially estimate the size of the impact of this.
    - habryka 24 Jan 2020 22:06 UTC
      14 points
      0
      Parent
      There were also something like 10 who didn’t spend their full vote-ballot, so my guess is that optimality concerns aren’t a super big deal for many users, though I generally think that we should align the natural interaction with the system with the one that also spends your points most effectively, since anything else just weirdly biases the results towards people who either just vote differently naturally, or are thinking more about meta-level voting strategies, neither of which seems like a particularly good bias.
  - Matt Goldenberg 25 Jan 2020 15:03 UTC
    4 points
    0
    Parent
    Note that I considered this, but saw that the final decision of what would be included was subjective anyway, and it would be a stronger signal to the judges to spend a lot of my points on a few posts I thought really deserved it and would otherwise be underrated.
  - Bucky 24 Jan 2020 23:26 UTC
    4 points
    0
    Parent
    Thanks, I was trying to work out a simple way to calculate how many times it would be worthwhile subtracting extra 1s—making the average close to 0 is a helpful simple rule.
  - Raemon 24 Jan 2020 23:23 UTC
    4 points
    0
    Parent
    Do you have suggestions for what to do instead, that’d roughly preserve the advantages of the current system, while being more intuitive?
    - Jameson Quinn 25 Jan 2020 0:26 UTC
      3 points
      0
      Parent
      Just renormalize votes to be mean-0 before scaling.
      - Raemon 25 Jan 2020 1:44 UTC
        9 points
        0
        Parent
        Note that one of the desiderata I’d prefer out of this is “positive votes actually mean ‘I liked this post’ and negative votes actually mean ‘I disliked this post’ as opposed to ‘I strategically voted this number to have a cumulative effect across my overall votes’”.
        (This isn’t crucial but it does make the swarmplots more intuitive to read)
        Jameson Quinn 25 Jan 2020 14:28 UTC
        3 points
        0
        Parent
        There’s frequently a tradeoff between “less strategic incentives” and “more-intelligible under honesty”. I don’t think that you should pick the former every time, but it is certainly better to err a little bit on the side of the former and get good-but-slightly-more-confusing results, than to err on the side of the latter and get results that are neither good nor intelligible (because strategic voting has ruined that, too).
        Raemon 25 Jan 2020 17:58 UTC
        2 points
        0
        Parent
        I agree, just looking for Pareto improvements if they exist (since we didn’t try much at all this year for them, it seemed plausible such things existed)
        Steven Byrnes 26 Jan 2020 23:21 UTC
        1 point
        0
        Parent
        Put a cap on the variance (standard deviation squared) of votes, rather than the sum-squared of votes? That would be equivalent to subtracting the mean from each vote …
        
        (Maybe you have to count the zero votes in the calculation)
      - Raemon 25 Jan 2020 0:28 UTC
        2 points
        0
        Parent
        Will that be obvious while the person is voting? (I guess this may be more of a UI question than a voting-theory question)
        Ben Pace 25 Jan 2020 1:05 UTC
        3 points
        0
        Parent
        I can imagine, similar to how we have a button for ‘re-order the posts’, we could have a button for ‘normalise my votes’.
        Raemon 25 Jan 2020 1:22 UTC
        2 points
        0
        Parent
        Still seems fairly confusing to me. Like, the act of futzing around with your vote totals shouldn’t give you the (temporary) impression that you can get more points in a way that can’t actually get you more points.
        Ben Pace 25 Jan 2020 1:28 UTC
        5 points
        0
        Parent
        The whole point is that it does give you more points, I think.
- Raemon 24 Jan 2020 16:41 UTC
  10 points
  0
  Parent
  Yeah, I actually found this to be a pretty annoying artifact of the voting system once I realized it.
- Ben Pace 24 Jan 2020 9:20 UTC
  5 points
  0
  Parent
  Yeah! I also noticed this when looking over the results; there was a paragraph on it in the OP that I cut.
Ruby 24 Jan 2020 17:26 UTC
11 points
0
The team will be conducting a Review of the Review where we take stock of what happened, discuss the value and costs of the Review process, and think about how to make the review process more effective and efficient in future years.
I just want to speak up for myself, as I mentioned in a different comment, that at least in my mind, we need to properly review this year’s Review before we’re definitely committing to run this every year. I think the OP implies a greater level of confidence that the project was a “success” and will be repeated in subsequent than I feel.

Just so far, I’ve seen a lot of good come from this year’s review that I’m very pleased with, but it’s a costly project (for the team and the community), so that calculation needs to be done carefully.

This comment shouldn’t be interpreted as a sign that I’m negative on the Review. This is my attitude to every project that takes up significant resources. I won’t have a firm opinion until I’ve thought about the Review a lot more and discussed at length with the team. We had to get the results out there quick though, ;)
Jameson Quinn 24 Jan 2020 3:47 UTC
10 points
0
On a self-interested note: you listed the top 15, but it appears my post, A Voting Theory Primer for Rationalists, was tied for 15th place, and yet was left off of the list. If that reflects some implicit tie-breaker, I’d like to know what it is. And — speaking purely as the author, not as a voting theorist — I’d suggest maybe you should just include my post in the list.
I’ll make a separate post with some analysis ideas later.
- Ben Pace 24 Jan 2020 3:56 UTC
  4 points
  0
  Parent
  You’re quite right, fixed :)
Tenoke 24 Jan 2020 14:57 UTC
7 points
0
1. It seems like very few people voted overall if the average is “10-20” voters per post. I hope they are buying 50+ books each otherwise I don’t see how the book part is remotely worth it.
2. The voting was broken in multiple ways—you could spend as many points as possible, but instead of a cut-off, your vote was just cast out due to the organizers’ mistake to allow it.
3. The voting was broken in the way described in the post, too.
4. People didn’t understand how the voting worked (Look at the few comments here) so they didn’t really even manage to vote in the way that satisfies their preferences best. The system and the explanation seem at fault.
5. I note that a lot of promotion went into this—including emails to non-active users, a lot of posts about it, long extended reviews.
So, my question is—do the organizers think it was worth it? And if yes, do you think it is worth it enough for publishing in a book? And if yes to both—what would failure have looked like?
- Ruby 24 Jan 2020 17:00 UTC
  16 points
  0
  Parent
  It seems like very few people voted overall if the average is “10-20” voters per post. I hope they are buying 50+ books each otherwise I don’t see how the book part is remotely worth it.
  I’m confused by this. Why would only voters be interested in the books? Also, this statement assumes that you have to sell 500-1000 books for it to be worth it– what’s the calculation for the value of a book sold vs the cost of making the books?
  The voting was broken in multiple ways—you could spend as many points as possible, but instead of a cut-off, your vote was just cast out due to the organizers’ mistake to allow it.
  I was surprised by this design decision too, though I’ll note that the number of points spent was displayed and went red once you exceeded the budget. (Which has the advantage of if you’re going over, you can place a vote and then decide whether to remove it or another.) Everyone except for the single person who spent 10,000 points kept to 500 or less.
  - Tenoke 24 Jan 2020 17:48 UTC
    2 points
    0
    Parent
    
    I’m confused by this. Why would only voters be interested in the books?
    
    Because I doubt there are all that much more people interested in these than the number of voters. Even at 1000 it doesn’t seem like a book makes all that much sense. In fact, I still don’t get why turning them into a book is even considered.
    - Zack_M_Davis 24 Jan 2020 18:30 UTC
      14 points
      0
      Parent
      Print-on-demand books aren’t necessarily very expensive: I’ve made board books for my friend’s son in print runs of one or two for like thirty bucks per copy. If the team has some spare cash and someone wants to do the typesetting, a tiny print run of 100 copies could make sense as “cool in-group souvenir”, even if it wouldn’t make sense as commercial publishing.
      - Tenoke 24 Jan 2020 20:51 UTC
        2 points
        0
        Parent
        Printing costs are hardly the only or even main issue, and I hadn’t even mentioned them. You are right though, those costs make the insistence on publishing a book make even less sense.
        
        Zack_M_Davis 24 Jan 2020 21:14 UTC
        6 points
        0
        Parent
        What, in your view, is the main issue? Other than printing/distribution costs, the only other problem that springs to mind is the opportunity cost of the labor of whoever does the design/typesetting, but I don’t think either of us is in a good position to assess that. What bad thing happens if the people who run a website also want to print a few paper books?
        
        Tenoke 24 Jan 2020 21:25 UTC
        1 point
        0
        Parent
        Why are so many resources being sunk into this specifically? I just don’t understand how it makes sense, what the motivation is and how they arrived at the idea. Maybe there is a great explanation and thought process which I am missing.
        
        From my point of view, there is little demand for it and the main motivation might plausibly have been “we want to say we’ve published a book” rather than something that people want or need.
        
        Having said that, I’d rather get an answer to my initial comment—why it makes sense to you/them—rather than me having to give reasons why I don’t see how it makes sense.
        
        habryka 24 Jan 2020 22:02 UTC
        17 points
        0
        Parent
        We have written some things about our motivation on this, though I don’t think we’ve been fully comprehensive by any means (since that itself would have increased the cost of the vote a good amount). Here are the posts that we’ve written on the review and the motivation behind it:
        The LessWrong 2018 Review
        Voting Phase of 2018 LW Review
        (Feedback Request) Quadratic voting for the 2018 Review
        The Review Phase
        The first post includes more of our big-picture motivation for this. Here are some of the key quotes:
        Quotes
        In his LW 2.0 Strategic Overview, habryka noted:
        We need to build on each other’s intellectual contributions, archive important content, and avoid primarily being news-driven.
        We need to improve the signal-to-noise ratio for the average reader, and only broadcast the most important writing
        [...]
        Modern science is plagued by severe problems, but of humanity’s institutions it has perhaps the strongest record of being able to build successfully on its previous ideas.
        The physics community has this system where the new ideas get put into journals, and then eventually if they’re important, and true, they get turned into textbooks, which are then read by the upcoming generation of physicists, who then write new papers based on the findings in the textbooks. All good scientific fields have good textbooks, and your undergrad years are largely spent reading them.
        Over the past couple years, much of my focus has been on the early-stages of LessWrong’s idea pipeline – creating affordance for off-the-cuff conversation, brainstorming, and exploration of paradigms that are still under development (with features like shortform and moderation tools).
        But, the beginning of the idea-pipeline is, well, not the end.
        I’ve written a couple times about what the later stages of the idea-pipeline might look like. My best guess is still something like this:
        I want LessWrong to encourage extremely high quality intellectual labor. I think the best way to go about this is through escalating positive rewards, rather than strong initial filters.
        Right now our highest reward is getting into the curated section, which… just isn’t actually that high a bar. We only curate posts if we think they are making a good point. But if we set the curated bar at “extremely well written and extremely epistemically rigorous and extremely useful”, we would basically never be able to curate anything.
        My current guess is that there should be a “higher than curated” level, and that the general expectation should be that posts should only be put in that section after getting reviewed, scrutinized, and most likely rewritten at least once.
        I still have a lot of uncertainty about the right way to go about a review process, and various members of the LW team have somewhat different takes on it.
        I’ve heard lots of complaints about mainstream science peer review: that reviewing is often a thankless task; the quality of review varies dramatically, and is often entangled with weird political games.
        ------
        Before delving into the process, I wanted to go over the high level goals for the project:
        1. Improve our longterm incentives, feedback, and rewards for authors
        2. Create a highly curated “Best of 2018” sequence / physical book
        3. Create common knowledge about the LW community’s collective epistemic state regarding controversial posts
        -------
        Longterm incentives, feedback and rewards
        Right now, authors on LessWrong are rewarded essentially by comments, voting, and other people citing their work. This is fine, as things go, but has a few issues:
        Some kinds of posts are quite valuable, but don’t get many comments (and these disproportionately tend to be posts that are more proactively rigorous, because there’s less to critique, or critiquing requires more effort, or building off the ideas requires more domain expertise)
        By contrast, comments and voting both nudge people towards posts that are clickbaity and controversial.
        Once posts have slipped off the frontpage, they often fade from consciousness. I’m excited for a LessWrong that rewards Long Content, that stand the tests of time, as is updated as new information comes to light. (In some cases this may involve editing the original post. But if you prefer old posts to serve as a time-capsule of your post beliefs, adding a link to a newer post would also work)
        Many good posts begin with an “epistemic status: thinking out loud”, because, at the time, they were just thinking out loud. Nonetheless, they turn out to be quite good. Early-stage brainstorming is good, but if 2 years later the early-stage-brainstorming has become the best reference on a subject, authors should be encouraged to change that epistemic status and clean up the post for the benefit of future readers.
        The aim of the Review is to address those concerns by:
        Promoting old, vetted content directly on the site.
        Awarding prizes not only to authors, but to reviewers. It seems important to directly reward high-effort reviews that thoughtfully explore both how the post could be improved, and how it fits into the broader intellectual ecosystem. (At the same time, not having this be the final stage in the process, since building an intellectual edifice requires four layers of ongoing conversation)
        Compiling the results into a physical book. I find there’s something… literally weighty about having your work in printed form. And because it’s much harder to edit books than blogposts, the printing gives authors an extra incentive to clean up their past work or improve the pedagogy.
        ------
        Common knowledge about the LW community’s collective epistemic state regarding controversial posts
        Some posts are highly upvoted because everyone agrees they’re true and important. Other posts are upvoted because they’re more like exciting hypotheses. There’s a lot of disagreement about which claims are actually true, but that disagreement is crudely measured in comments from a vocal minority.
        The end of the review process includes a straightforward vote on which posts seem (in retrospect), useful, and which seem “epistemically sound”. This is not the end of the conversation about which posts are making true claims that carve reality at it’s joints, but my hope is for it to ground that discussion in a clearer group-epistemic state.
        Further Comments
        I expect we will write some more in the future about some of the broader goals behind the review, but the above I think summarizes a bunch of the high-level considerations reasonably well.
        I think one way one could describe at least my motivation for the review is that one of the big holes that I’ve always perceived in LessWrong, and the internet at large, is the focus on things that are popular in the moment, and that it’s hard for people to really build on other people’s ideas and make long-term intellectual progres. The review is an experiment in creating an incentive and attention allocation mechanism that tries to counteract those forces. I am not yet sure how much it succeeded at that, though I am broadly pleased with how it went.
        Zack_M_Davis 6 Dec 2020 17:09 UTC
        10 points
        0
        Parent
        They’ve sold 1000 copies. I’m surprised, too!
        Zack_M_Davis 24 Jan 2020 22:26 UTC
        8 points
        0
        Parent
        
        I’d rather get an answer to my initial comment—why it makes sense to you/them
        
        Sure. I explained my personal enthusiasm for the Review in a November comment.
- Vaughn Papenhausen 26 Jan 2020 19:31 UTC
  14 points
  0
  Parent
  
  I hope they are buying 50+ books each otherwise I don’t see how the book part is remotely worth it.
  
  As a data point, I did not vote, but if there is a book, I will almost certainly be buying a copy of it if it is reasonably priced, i.e. similar price to the first two volumes of R:A-Z ($ 6-8).
- Ruby 24 Jan 2020 17:03 UTC
  10 points
  0
  Parent
  So, my question is—do the organizers think it was worth it? And if yes, do you think it is worth it enough for publishing in a book? And if yes to both—what would failure have looked like?
  These are really excellent questions. The OP mentions the intention to “review the review” in coming weeks; there will be posts about this, so hang tight. Obviously the whole project had very high costs, so we have to think carefully through whether the benefits justify them and whether we should continue the Review process in future years. Speaking for myself, it’s not obvious that it was worth it, but still quite possible. It’s a hard question because I expect the many of the benefits to accrue over time and be not straightforward to measure.
  I think we should do a thorough review now with what we know now, and would need to do another review in ~year’s time before pressing go on the next iteration.
  I’ve generally been pushing for all major projects at LW to be properly reviewed with an eye to: Where they worth it? What did we learn? And what remains to be done?
  What links here?
  - Ruby's comment on 2018 Review: Voting Results! by Ben Pace (24 Jan 2020 17:26 UTC; 11 points)
  - Tenoke 24 Jan 2020 21:19 UTC
    2 points
    0
    Parent
    Thanks for the reply. That seems like a sensible position.
    
    It sounds like maybe you were less involved in this than some of the 7(is that right?) other employees/admins so I’m very curious to hear their take, too.
    - Ruby 24 Jan 2020 22:46 UTC
      2 points
      0
      Parent
      There are five people on the team. I wasn’t the most involved, but I was still very involved. But you’ll hear from all of soon, don’t you worry.
habryka 24 Jan 2020 19:14 UTC
4 points
0
Promoted to curated: This is a bit of an odd curation, but my guess is that the vote results are of interest to many people, and many will be happy to have read them, so it seems like a good curation target. Less curated for the statistics or the writing, and more curated to establish common-knowledge of the vote results.
Raemon 27 Jan 2020 20:56 UTC
3 points
0
Truth/Usefulness vs Prestige/Reputation
A worry I have with our voting system this year is that it felt more like “ranking posts by prestige” than “ranking them by truth/usefulness.” It so happens that “prestige in LW community” does prioritize truth/usefulness and I think the outcome mostly tracked that, but I think we can do better.
The reason I’m worried is a) I’d expected, by default, for these things to be jumbled together, and b) whatever you thought of Affordance Widths, it seems pretty unlikely for it’s “-140” score to be based on the merits of the post, rather than people not wanting the author represented in a Best of LW book.
I think reputational effects matter and it’s fine to give people an outlet for that, but I think it’s better if we ask those questions separately from questions of truth, and usefulness. (And, I think the vote and overall Review process is more interesting if truth/usefulness is the primary thing getting looked into)
It so happens I don’t think Affordance Widths was in the top 35 posts, and not something I’d have included in the book on its merits (not because it was wrong – it seems like a basically true model to me – just because there was other stuff that was better). I think it’s plausible for the post to have ended up with a negative score on it’s own merits, but I doubt that an author-blind review would have given it −140.
So right now, I’d assume that we’d get similar results from any post by a similarly unpopular author, even if the ideas in the post were quite important.
I think it’s pretty important for the community to be able to engage with that sort of thing, and maybe collectively decide “okay, we’re not putting this in our public facing book for reputational reasons, but we should at least be able to evaluate it clearly in our inward facing review process.”
And meanwhile, the fact that that post’s score was clearly determined reputationally reinforces some of my prior-worries about the vote getting truth and prestige jumbled together, in less controversial post examples.
What links here?
- Reviewing the Review by Raemon (26 Feb 2020 2:51 UTC; 45 points)
- Raemon 27 Jan 2020 21:04 UTC
  18 points
  0
  Parent
  I had been worried about this earlier in the design-process for the review. A major reason I didn’t object to the system we ended up with is...
  ...well, if you tell people “evaluate these posts for truth/usefulness”, but clearly the results of the vote are going to translate into prestige, it seems like it makes the situation worse rather than better if you try to pretend otherwise.
  But I think I might be interested in experimenting in another direction next year, where the vote isn’t focused on relative ranking of posts at all, instead it’s more like a survey where people answer qualitative questions, like:
  - Have you thought about the ideas in this post in the past year?
  - Do the ideas in this post seem important?
  - How do you feel about this post’s epistemics
  - Should this post appear in a public-facing Best of LW book?
  - Should this post appear in an inward-facing, high-context LW Journal?
  What links here?
  - Reviewing the Review by Raemon (26 Feb 2020 2:51 UTC; 45 points)
jmh 24 Jan 2020 14:19 UTC
3 points
0
Just looking at the distributions on the individual articles is a bit interesting. Wonder if there are common characteristics about the articles with similar distribution patters of the votes.

2018 Review: Voting Results!

Top 15 posts

Top 15 posts not about AI

Top 10 posts about AI

The Complete Results

How reliable is the output of this vote?

The Future

Quotes

Further Comments