Review of Q&A [LW2.0 internal document]

Context

1. This is the first in a series of internal LessWrong 2.0 team documents we are sharing publicly (with minimal editing) in an effort to help keep the community up to date with what we’re thinking about and working on.

2. Caveat! This is internal document and does not represent any team consensus or conclusions; it was written by me (Ruby) alone and expresses my in-progress understanding and reasoning. To the extent that the models/​arguments of the other team members are included here, they’ve been filtered through me and aren’t necessarily captured with high fidelity or strong endorsement. Since it was written on March 17th, it isn’t even up to date with my own thinking

3. I, Ruby (Ruben Bloom), am trialling with the LessWrong 2.0 team in a generalist/​product/​analytics capacity. Most of my work so far has been trying to help evaluate the hypothesis that Q&A is a feasible mechanism to achieve intellectual progress at scale. I’ve been talking to researchers; thinking about use-cases, personas, and jobs to be done; and examining the data so far.
.

.

Epistemic status: this is one of the earlier documents I wrote in thinking about Q&A and my thinking has developed a lot since, especially since interviewing multiple researchers across EA orgs. Subsequent documents (to be published soon) have much more developed thoughts.

In particular, subsequent docs have a much better analysis of the uncertainties and challenges of making Q&A work that this one. This document is worth reading in addition to them mostly for an introduction to thinking about the different kinds of questions, our goals, and how things are going so far.

Originally written March 17th

I’ve been thinking a lot about Q&A the past week since it’s a major priority for the team right now. This doc contains a dump of many of my thoughts. In thinking about Q&A, it also occurred to me that an actually marketplace for intellectual labor could do a lot of good and is strong in a number of places where Q&A is weak. This document also describes that vision and why I think it might be a good idea.

1. Observations of Q&A so Far.

First off, pulling some stats from my analytics report (numbers as of 2019-03-11):

How long has Q&A been live?
Since 2018-12-07. Just about 3 months as of 2019-03-11 (94 days)
How many questions?
94 questions published + 20 drafts
How many answers?
191 answers, 171 direct comments on answers
How many people asking questions?
59 distinct usernames posted questions (including the LW team)
How many people answering questions?
117 unique usernames posting answers
172 unique usernames who answered or posted direct comment on question
How many people engaging overall?
Including questions, answers, and comments, 226 usernames have actively engaged with Q&A.

Note: “viewCount” is a little unreliable on LW2 (I think it might double-count sometimes); “num_distinct_viewers” refers only to logged-in viewers.

Spreadsheet of Questions as of 2019-03-08

List of Q&A Uncertainties

See Appendix for all my workings on the Whiteboard

Q&A might be a single feature/​product in the UI and in the codebase, but there are multiple distinct uses for the single feature. Different people trying to accomplish different things. Going over the questions, I see rough clusters, listed pretty much in order of descending prevalence:

  1. Asking for recommendations, feedback, and personal experience.

    1. “Which self-help has helped you?”, “Is there a way to hire academics hourly?”

  2. Asking for personal advice.

    1. “What’s the best way to improve my English pronunciation?”

  3. Asking conceptual/​philosophy/​ideology/​models/​theory type question.

    1. “What are the components of intellectual honesty?”

  4. Asking for opinions

    1. “How does OpenAI’s language model affect our AI timeline estimates?”

  5. Asking for help studying a topic.

    1. “What are some concrete problems about logical counterfactuals?”, “Is the human brain a valid choice choice of Universal Turing Machine . . . ?”

  6. Asking general research/​lit-review-ish questions (not sure how to name)

    1. “Does anti-malaria charity destroy the local anti-malaria industry?”, “Understanding Information Cascades”, “How large is the fallout area of the biggest cobalt bomb we can build?”

  7. Asking open research-type questions (not sure to how name this cluster)

    1. “When is CDT Dutch-Bookable?”, “How does Gradient Descent Interact with Goodhart?”, “Understanding Information Cascades”

These questions are roughly ordered from “high prevalence + easier to answer” to “low prevalence + harder to answer”.

A few things stick out. I know the team has noticed already, but want to list them here anyway is part of the bigger argument. The questions which are most prevalent are those which are:

  1. relatively quick to ask, e.g. write a few paragraphs at most.

  2. there is a [relatively] large population of people who are qualified to answer.

  3. the kind of questions people are used to asking elsewhere, e.g. CFAR Alumni Mailing List, Facebook, Reddit, LessWrong (posts and comments) Quora, StackExchange.

  4. the kinds of questions for which there are existing forums, as above.

  5. they can answered primarily using the answerer’s existing knowledge, e.g. people who answer advanced math problems but using their existing understanding.

  6. the questions can be answered in a single session at one’s computer, often without needing even to open another browser tab.

What is apparent is that questions which break from the above trends, e.g. questions which can be hard to explain (taking a long to write up), require skill/​expertise to answer, can’t be answered purely from an answerer’s existing knowledge (unless by fluke they’re expert in a niche area), and require more effort than simply typing an answer or explanation—these questions are really of a very different kind. They’re a very different category and both asking and answering such questions is a very different activity from asking the other kind.

What we see is that LessWrong’s Q&A is doing very well with the first kind—the kind of questions people are already used to asking and answering elsewhere. There’s been roughly a question per day for the three months Q&A has been live, but the overwhelming majority are requests for recommendations and advice, opinions, and philosophical discussion. Only a small minority (no more than a couple dozen) are solid research-y questions.

There’ve been a few of the “help me understand”/​confusions type you might see on StackExchange (which I think are real good). And a few pure research-y type questions, but around half of those were asked by the LessWrong team and friends. Around 10% of questions, really on the order of 10 questions or less in the last three months by my count.

I think these latter questions are more the sort we’d judge to be “actual serious intellectual progress”, or at least, those are the questions we’d love to see people asking more. They’re the kinds of questions that predominantly the LessWrong team is creating rather than users.

2. Our vision for Q&A is getting people to do a new and effortful thing. That’s hard.

The previous section can be summarized as follows:

  • Q&A has been getting used since it was launched, but primarily by people do things they were already used to doing elsewhere. And things which are relatively low effort.

  • The vision for Q&A is scaling up intellectual progress on important problems. Doing real research. People taking their large questions, carving of pieces, people going off and making their own contributions of research (without hiring and all that overhead).

The thing about the LW vision for Q&A is that it means getting people to do a new and different thing from what they’re used to, plus that thing is way more effort. It’s not impossible, but it is hard.

It’s not a new and better way to do something they’re already doing, it’s a new thing they haven’t even dreamt of. Moreover, it looks like something else which they are used to, e.g. Quora, StackExchange, Facebook—so that’s how they use it and how they expect other to use it by default. The term “category creation” comes to mind, if that means anything. AirBnB was new category. LessWrong is trying to create a new category, but it looks like existing categories.

3. Bounties: the potential solution and its challenges

The most straightforward way to get people to expend effort is to pay them. Or create the possibility of payment. Hence bounties. Done right, I think bounties could work, but I think it’s going to be a tough uphill battle to implement them in a way which does work.

[Edited: Raemon has asked a question about incentives/​bounties for answering “hard questions.” Fitting in this paradigm here, we’d really value further answers.]

4. Challenges facing bounties (and Q&A in general)

  • Even if we have a system which works well, it’s going to be new and different and we’re going to have to work to get users to understand it and adopt it. A lot user education and training.

    • The closest analogue I can think of is Kaggle competitions, but they’re still pretty different: clear objective evaluation, you build transparently valuable skills, it feels good to get a high rank even if you don’t win, there are career rewards just for participating and doing relatively well.

  • Uncertainty around payment. People might do a lot of work for money, but the incentive is much weaker if you’re unsure if you’ll get paid. People decide whether it’s based on EV, not the absolute number of dollars pledged.

    • And you might not be bothered to read and understand complicated bounty rules.

  • People with questions and placing bounties might not trust the research quality of whoever random person happens to answer.

    • A mathematical review can be checked, but it’s harder to that with lit reviews and generalist research.

    • Evaluating research quality might require a significant fraction of the effort required to do the research in the first place.

  • People with questions might usually have a deeper, vaguer, general question they’re trying to answer. They want the actual thing answered, not a particular sub-question which may or may not be answered. Eli spoke of desiring that someone would become expert in a topic so they could then ask them lots of questions about it.

  • With Q&A, it’s challenging for the asker and answerer to have a good feedback loop as the answerer is working. It would seem to be harder for the the answerer to asking clarifying questions and share intermediate results (and thereby get feedback), and harder for the asker to ask further follow-up question. This gets worse once there are multiple people working on the question, all potentially needing further time and attention from the asker in order to do a good job.

  • Q&A (even with bounties) face the two-sided marketplace problem. Question askers aren’t going to bother writing up large and difficult to explain questions if they don’t expect to get answered. (Even less so if they try once and fail to get a response worth the effort). Potential answerers aren’t going to make it a habit to research for a platform which doesn’t have many real research questions (mostly stuff about freeze dried mussel powder and English pronunciation and what not).

5. What would it take to get it to work

Thinking about the challenges, it seems it could be made to work if we following happens:

  • We successfully get both askers and answerers to understand that LW’s Q&A is something distinctly different from other things they’re used.

    • UI changes, tags, renaming things, etc. might all help, plus explanatory posts and hands-on training with people.

    • Making becoming a question answerer a high-status thing would certainly help. If Luke or Eliezer or Nate were seen using the platform, might give it a lot of endorsement/​legitimacy.

  • We successfully incentivize question answerers to expend the necessary effort to answer research questions.

    • This is partly through monetary reward, but might also include having them believe that they’re actually helping on something important, are actually getting status. (Weekly or monthly prizes for best answers—separate from bounties—might be a way to do that. Or heck, a leaderboard for Q&A contributions adjusted by karma.)

  • We get question askers to actually believe they can get real serious progress on questions for Q&A.

    • Easiest to do once we have some examples. Proof of concept goes a long way. Get a few wins and we talk about them with researchers, show them that it works.

      • It’s getting those first few examples which is going to be hardest. As they say, the first ten clients always require hustle.

  • We ensure that question answerers have a positive ROI experience for all the time spent writing up questions, reading the response, etc., etc.

  • We somehow address concerns that research might not be reliable because you don’t fully trust the research ability of people on the internet—especially not when you’re trying to make important decisions on the basis of your research.

Even then, I think getting it to work will depend on understand which research questions can be well handled by this kind of system.

6. My Uncertainties/​Questions

Much of what I’m saying here is coming from thinking hard about Q&A for several days, using models from startups in general, and some limited user interaction. I could just be wrong about several of the assumptions being used above.

Some of the key questions I want answered to be more sure of models are:

  • What are the beliefs/​predictions/​anticipations about Q&A of our ideal question askerer’s?

    • In particular, do they think it could actually help them with their work? If not, why not? If yes, how?

    • Is trust a real issue for them? Are worried about research quality?

    • Do they have “discrete” questions they can ask, or is it usually some deeper topic they want someone to spend several days becoming an expert on?

  • What is the willingness of ideal question answerers to answer questions on Q&A?

    • Which incentives matter to them? (Impact, status, money) How well do they view current Q&A as meeting them?

      • Do they feel like they’re actually doing valuable work in answering questions?

There are other questions, but that’s a starter.

7. Alternative Idea: Marketplace for Intellectual Labor

Once you’re talking about paying out bounties for people researching answers, you’re most of the way towards just outright hiring people to do work. A marketplace. TaskRabbit/​Craigslist for intellectual labor. I can see that being a good idea.

How it would work

  • People wanting to be hired have “profiles” on LW which include anything relevant to their ability to do intellectual labor. Links to CV, LinkedIn, questions answered on Q&A, karma, etc.

    • The profiles may be public or private or semi-private.

  • People seeking to hire intellectual labor can create “tasks”

  • Work can be assigned in two directions.

    • Hirers can post their tasks publicly and then people bid/​offer to work on the task.

    • Hirers can browse the list of people who have created profiles and reach out to people they’re interested in hiring, without ever needing to make their task or research public.

Why this is a good idea

  • A marketplace is a really standard thing, people already have models and expectations for how they work and how to interact with them. In this case, it’s just a marketplace for a particular kind of thing, otherwise the mechanics are what people are used to. Say “TaskRabbit for research/​intellectual labor” and I bet people will get it.

    • Also marketplace and working for money is more probably the right genre for working hard for several days and deterministically getting paid. The thing about Q&A is that it’s somewhat trying to get people to do serious work via something which looks a lot like the things they do recreationally.

  • It reduces uncertainty around payment/​incentives. The two parties negotiate a contract (perhaps payment happens via LW) and the worker knows that they will get paid as much as in any real job they might be hired for.

  • It solves the trust thing. 1) The hirers get to select who they trust with their research questions, it’s not open to anyone. The profiles are helpful for this, as a careful hirer can carefully go through someone’s qualification and past work to see if they trust them.

    • LessWrong could even create a “pipeline” of skill around the marketplace for intellectual labor. People start with simple, low-trust tasks and as they prove themselves and get good reviews, they’re more attractive.

  • It addresses privacy. You might not be willing to place your research questions on the public internet, but you might be willing them to trust a carefully vetted single person who you hire.

  • It addresses the two-sided marketplace challenge. The format allows you to build each side somewhat asynchronously.

    • Find a few people and convince them to create a few work tasks they’d like done but aren’t urgent (approximately questions). Once they’re up there, you can say “yes, we’ve got some tasks that Paul Christiano would like answers on”

    • Find people who would be interested in the right kind of work, get them to create profiles. They don’t have to commit to doing any particular work, they’re just there in case someone else wants to reach out to hire them. (One could imagine making it behave like Reciprocity.)

  • It lets you hire for things like tutoring.

    • Eli mentioned how much he values 1:1 interaction and tutoring. When he’s got a confusion, he seeks a tutor. That’s not something Q&A really supports, but a marketplace for intellectual labor could.

    • It could be an efficient way for people looking for knowledge from an expert to be able to find one who is available and at the right price.

      • I’ve seen spreadsheets over the years of EA’s registering their names, interests, and skills. I don’t know if people ever used them, but it does seem neat if there was just a locked-in service which was directory of experts on various topics that you could pay to access.

  • [Added] It diversifies the range of work you can hire for.

    • It seems good if people doing research work can hire people to format latex, proofread, edit, and generally handle tasks which frees them up to more core research.

  • [Added] It doesn’t limit the medium of the work to being the LessWrong platform. Once an arrangement is made, the hirer and worker are free to work in person, work via Google Doc, Skype, or whatever else is most convenient and native to their work. In contrast, Q&A makes the format UI experience of the platform a bottleneck on communication of research.

    • Needing to write up results formally in a way that is suitable for the public is also a costly step that is avoided in 1-to-1 work arrangement.

    • It does seem that CKeditor could dovetail really nice with collaboration via the marketplace, assuming otherwise people are using Google Docs. Once the research content is already on LW, we can streamline the process of making it public.

      • Research being conducted in Google docs and then polished and share might be a much more natural flow than needing people to conduct research in whatever tools and then translate it into the format of comments/​answers.

        • Another idea: building in things like citation management and other stuff to the LW Google Doc’s and building a generally great research

You could build up several dozen or hundred worker (laborer?) profiles before you approach highly acclaimed researchers and say “hey, we’ve got a list of people willing to offer intellectual labor”, interested in taking a look? Or “we’ve got tasks from X, Y, Z—would you like to look and see if you can help?”

[redacted]: “I’d help [highly respected person] with pretty much whatever.” Right now [highly respected person] has no easy way to reach out to people who might be able to do work for them. I’m sure X and Y [redacted] wouldn’t mind a better way for people to locate their services.

In the earlier stages, LessWrong could do a bit of matchmaking. Using our knowledge and connections to link up suitable people to tasks.

Existing services like this (where the platform is kind of a matchmaker) such as TaskRabbit and Handy struggle because people use the platform initially to find someone, e.g. a house cleaner, but then bypass the middleman to book subsequent services. But we’re not trying to make money off of this, we don’t need to be in the middle. If a task happens because of the LW marketplace and then two people have an ongoing work relationship—that is fantastic.

Crazy places where this leads

You could imagine this ending up with LessWrong playing the role of some meta-hirer/​recruiting agency type thing. People create profiles, upload all kinds of info, get interviewed—and then they are rated and ranked within the system. They then get matched with suitable tasks. Possibly only 5-10% of the entire pool ever gets work, but it’s more surface area on the hiring problem within EA.

80k might offer career advice, but they’re not a recruiting agency and they don’t place people.

Why it might not be that great (uncertainties)

It might turn out that all the challenges of hiring people generally apply when hiring just for more limited tasks, e.g. trusting them to do a good job. If it’s too much hassle to vet all the profiles vying to work on your task, learn how to interact with a new person around research, etc., then people won’t do it.

If it turns out that it is really hard to discretize intellectual work, then a marketplace idea is going to face the same challenges as Q&A. Both would require some solution of the same kind.

I’m sure there’s a lot more to go here. I’ve only spent a couple of hours thinking about this as of 317.

Q&A + Marketplace: Synergy

I think there could be some good synergies. Ways in which each blend into each other and support each other. Something I can imagine is that there’s a “discount” on intellectual labor hired if those engaged in the work allow it to be made public on LW. The work done through the marketplace gets “imported” as a Q&A where further people can come along and comment and provide feedback.

Or someone is answering your question and like what they’ve said, but you want more. You could issue an “invite” to hire them them to work more on your task. Here you’d get the benefits of a publicly posted question anyone can work plus the benefits of a dedicated person you’re paying and working closely with. This person, if they become an expert in the topic, could even begin managing the question thread freeing up the important person who asked the question to begin with.

8. Appendix: Q&A Whiteboard Workings