The Stack Overflow of Factored Cognition

Abstract

Factored cognition is a possible basis for building aligned AI. Currently Ought runs small-scale experiments with it. In this article I sketch some benefits of building a system for doing large-scale experiments and generating large amounts of data for ML training. Then I estimate roughly how long it would take to build such a system. I’m not confident of this exploration being useful at all. But at least I wrote it down.

Benefits

If you want to know what factored cognition is, see here.

Ought does small-scale experiments with factored cognition (cf. Ought’s Progress Update Winter 2018). I thought: wouldn’t it be nice to do these experiments at much larger scale? With enough users that one root question could be answered within three hours any time and day of the week.

Benefits:

The feedback loop would be much tighter than with the weekly or bi-weekly experiments that Ought runs now. A tight feedback loop is great in many ways. For example, it would allow a researcher to test more hypotheses more often, more quickly and more cheaply. This in turn helps her to generate more hypotheses overall.

Note that I might be misunderstanding the goals and constraints of Ought’s experiments. In that case this benefit might be irrelevant.
It would generate a lot of data. These could be used as training data when we want to train an ML system to do factored cognition.

Quantifying these benefits is possible, but would take some weeks of modelling and talking with people. So far I’m not confident enough of the whole idea to make the effort.

Feasibility

We would need three things for a large-scale factored cognition system to work: the system itself, enough users and useful behaviour of these users. I’ll use Stack Overflow as a basis for my estimates and call large-scale factored cognition ‘Fact Overflow’.

Building Stack Overflow took five months from start of development ㊮ to public beta ㊮. Then they spent a lot of time tweaking the system to make it more attractive and maintain quality. So I’d say building Fact Overflow would take five to fifteen months with a team of two to five people.

For calculating how many users would be required, I used the following estimates (90 % confidence interval, uniformly distributed):

variable	– 5 %	– 95 %	– explanation
$n_{w}$	15	300	– average number of workspaces per tree
$n_{a}$	1	5	– average number of actions per workspace
$x_{c}$	0.1	0.7	– decontamination factor
$x_{a}$	0.1	0.7	– share of active users among all users
$f_{a} / d$	1	10	– average frequency of actions per active user

(I had to insert dashes to make the table look neat.)

$x_{c}$ is the share of workspaces in a tree that one user can work on without being contaminated, ie. without getting clues about the context of some workspaces.

The estimates are sloppy and probably overconfident. If people show interest in this topic, I will make them tighter and better calibrated.

Now if we want a tree of workspaces to be finished within $t_{f}$ , we need $n_{u}^{*}$ users, where: $n_{u}^{*} = \frac{n_{w} \cdot n_{a}}{x_{c} \cdot x_{a} \cdot f_{a} \cdot t_{f}}$

A Guesstimate model based on this formula tells me that for $t_{f} = 3 h$ we need between 600 and 36 k users. Note that Guesstimate runs only 5000 samples, so the numbers jump around with each page reload. Note also that the actual time to finish a tree might be longer, depending on how long users take for each action and how many sub-questions have to be worked on in sequence.

How long would it take to accumulate these numbers of users? For this I use the number of sign-ups to Stack Exchange (of which Stack Overflow is the largest part). Let me assume that between 75 % and 98 % of people who sign up actually become users. That means between 700 and 42 k sign-ups are required. This is also in Guesstimate. What I can’t include in the Guesstimate simulation is the difference between the growth rates of Stack Overflow and Fact Overflow. Assume that it takes Fact Overflow twice as long as Stack Overflow to reach a certain number of sign-ups. Then it would take one month to reach 700 sign-ups and twenty-two months to reach 42 k sign-ups.

Of course, the system would have to be useful and fun enough to retain that many users. As with Stack Overflow, the software and the community have to encourage and ensure that the users behave in a way that makes factored cognition work.

Conclusion

It would be useful to be able to experiment with factored cognition at a large scale. I can’t quantify the usefulness quickly, but I did quantify very roughly what it would take: five to fifteen months of development effort with a small team plus one to twenty-two months of accumulating users.

Comment prompts

What do you think I’m misunderstanding?
Do you think my exploration of large-scale factored cognition is a waste of time? If so, why?
Do you think one could build a platform attractive enough to that many users? If so, how? What kinds of questions and topics would be inclusive enough to gain critical mass and exclusive enough to maintain quality?