Factored cognition is a possible basis for building aligned AI. Currently Ought
runs small-scale experiments with it. In this article I sketch some benefits of
building a system for doing large-scale experiments and generating large amounts
of data for ML training. Then I estimate roughly how long it would take to build
such a system. I’m not confident of this exploration being useful at all. But at
least I wrote it down.
Benefits
If you want to know what factored cognition is, see
here.
Ought does small-scale experiments with factored cognition (cf. Ought’s Progress
Update Winter 2018). I thought: wouldn’t it be nice to do these experiments at
much larger scale? With enough users that one root question could be answered
within three hours any time and day of the week.
Benefits:
The feedback loop would be much tighter than with the weekly or bi-weekly
experiments that Ought runs now. A tight feedback loop is great in many ways.
For example, it would allow a researcher to test more hypotheses more
often, more quickly and more cheaply. This in turn helps her to generate more
hypotheses overall.
Note that I might be misunderstanding the goals and constraints of Ought’s
experiments. In that case this benefit might be irrelevant.
It would generate a lot of data. These could be used as training data when we
want to train an ML system to do factored cognition.
Quantifying these benefits is possible, but would take some weeks of modelling
and talking with people. So far I’m not confident enough of the whole idea to
make the effort.
Feasibility
We would need three things for a large-scale factored cognition system to work:
the system itself, enough users and useful behaviour of these users. I’ll use
Stack Overflow as a basis for my estimates and call large-scale factored
cognition ‘Fact Overflow’.
Building Stack Overflow took five months from start of development
㊮ to public beta
㊮.
Then they spent a lot of time tweaking the system to make it more attractive and
maintain quality. So I’d say building Fact Overflow would take five to fifteen
months with a team of two to five people.
For calculating how many users would be required, I used the following estimates
(90 % confidence interval, uniformly distributed):
variable
– 5 %
– 95 %
– explanation
nw
15
300
– average number of workspaces per tree
na
1
5
– average number of actions per workspace
xc
0.1
0.7
– decontamination factor
xa
0.1
0.7
– share of active users among all users
fa/d
1
10
– average frequency of actions per active user
(I had to insert dashes to make the table look neat.)
xc is the share of workspaces in a tree that one user can work on without
being contaminated, ie. without getting clues about the context of some
workspaces.
The estimates are sloppy and probably overconfident. If people show interest in
this topic, I will make them tighter and better calibrated.
Now if we want a tree of workspaces to be finished within tf, we need n∗u
users, where:
n∗u=nw⋅naxc⋅xa⋅fa⋅tf
A Guesstimate model based on this
formula tells me that for tf=3h we need between 600 and 36 k
users. Note that Guesstimate runs only 5000 samples, so the numbers jump around
with each page reload. Note also that the actual time to finish a tree might be
longer, depending on how long users take for each action and how many
sub-questions have to be worked on in sequence.
How long would it take to accumulate these numbers of users? For this I use the
number of sign-ups to Stack
Exchange
(of which Stack Overflow is the largest part). Let me assume that between 75 %
and 98 % of people who sign up actually become users. That means between 700 and
42 k sign-ups are required. This is also in Guesstimate. What I can’t include in
the Guesstimate simulation is the difference between the growth rates of Stack
Overflow and Fact Overflow. Assume that it takes Fact Overflow twice as long as
Stack Overflow to reach a certain number of sign-ups. Then it would take one
month to reach 700 sign-ups and twenty-two months to reach 42 k sign-ups.
Of course, the system would have to be useful and fun enough to retain that many
users. As with Stack Overflow, the software and the community have to encourage
and ensure that the users behave in a way that makes factored cognition work.
Conclusion
It would be useful to be able to experiment with factored cognition at a large
scale. I can’t quantify the usefulness quickly, but I did quantify very roughly
what it would take: five to fifteen months of development effort with a small
team plus one to twenty-two months of accumulating users.
Comment prompts
What do you think I’m misunderstanding?
Do you think my exploration of large-scale factored cognition is a waste of
time? If so, why?
Do you think one could build a platform attractive enough to that many users?
If so, how? What kinds of questions and topics would be inclusive enough to
gain critical mass and exclusive enough to maintain quality?
The Stack Overflow of Factored Cognition
Abstract
Factored cognition is a possible basis for building aligned AI. Currently Ought runs small-scale experiments with it. In this article I sketch some benefits of building a system for doing large-scale experiments and generating large amounts of data for ML training. Then I estimate roughly how long it would take to build such a system. I’m not confident of this exploration being useful at all. But at least I wrote it down.
Benefits
If you want to know what factored cognition is, see here.
Ought does small-scale experiments with factored cognition (cf. Ought’s Progress Update Winter 2018). I thought: wouldn’t it be nice to do these experiments at much larger scale? With enough users that one root question could be answered within three hours any time and day of the week.
Benefits:
The feedback loop would be much tighter than with the weekly or bi-weekly experiments that Ought runs now. A tight feedback loop is great in many ways. For example, it would allow a researcher to test more hypotheses more often, more quickly and more cheaply. This in turn helps her to generate more hypotheses overall.
Note that I might be misunderstanding the goals and constraints of Ought’s experiments. In that case this benefit might be irrelevant.
It would generate a lot of data. These could be used as training data when we want to train an ML system to do factored cognition.
Quantifying these benefits is possible, but would take some weeks of modelling and talking with people. So far I’m not confident enough of the whole idea to make the effort.
Feasibility
We would need three things for a large-scale factored cognition system to work: the system itself, enough users and useful behaviour of these users. I’ll use Stack Overflow as a basis for my estimates and call large-scale factored cognition ‘Fact Overflow’.
Building Stack Overflow took five months from start of development ㊮ to public beta ㊮. Then they spent a lot of time tweaking the system to make it more attractive and maintain quality. So I’d say building Fact Overflow would take five to fifteen months with a team of two to five people.
For calculating how many users would be required, I used the following estimates (90 % confidence interval, uniformly distributed):
(I had to insert dashes to make the table look neat.)
xc is the share of workspaces in a tree that one user can work on without being contaminated, ie. without getting clues about the context of some workspaces.
The estimates are sloppy and probably overconfident. If people show interest in this topic, I will make them tighter and better calibrated.
Now if we want a tree of workspaces to be finished within tf, we need n∗u users, where: n∗u=nw⋅naxc⋅xa⋅fa⋅tf
A Guesstimate model based on this formula tells me that for tf=3h we need between 600 and 36 k users. Note that Guesstimate runs only 5000 samples, so the numbers jump around with each page reload. Note also that the actual time to finish a tree might be longer, depending on how long users take for each action and how many sub-questions have to be worked on in sequence.
How long would it take to accumulate these numbers of users? For this I use the number of sign-ups to Stack Exchange (of which Stack Overflow is the largest part). Let me assume that between 75 % and 98 % of people who sign up actually become users. That means between 700 and 42 k sign-ups are required. This is also in Guesstimate. What I can’t include in the Guesstimate simulation is the difference between the growth rates of Stack Overflow and Fact Overflow. Assume that it takes Fact Overflow twice as long as Stack Overflow to reach a certain number of sign-ups. Then it would take one month to reach 700 sign-ups and twenty-two months to reach 42 k sign-ups.
Of course, the system would have to be useful and fun enough to retain that many users. As with Stack Overflow, the software and the community have to encourage and ensure that the users behave in a way that makes factored cognition work.
Conclusion
It would be useful to be able to experiment with factored cognition at a large scale. I can’t quantify the usefulness quickly, but I did quantify very roughly what it would take: five to fifteen months of development effort with a small team plus one to twenty-two months of accumulating users.
Comment prompts
What do you think I’m misunderstanding?
Do you think my exploration of large-scale factored cognition is a waste of time? If so, why?
Do you think one could build a platform attractive enough to that many users? If so, how? What kinds of questions and topics would be inclusive enough to gain critical mass and exclusive enough to maintain quality?