Some numbers related to c (how many capabilities researchers):
In 2018 about 8,500 people attended NeurIPS and about 4,000 people attended ICML. There are about 2,000 researchers who work at Google AI, and in December 2017 there were reports that about 700 total people work at DeepMind including about 400 with a PhD.
Turning this into a single estimate for “number of researchers” is tricky for the sorts of reasons that catherio gives. Capabilities researchers is a fuzzy category and it’s not clear to what extent people who are working on advancing the state of the art in general AI capabilities should include people who are primarily working on applications using the current art and people who are primarily working on advancing the state of the art in narrower subfields. Also obviously only some fraction of the relevant researchers attended those conferences or work at those companies.
I’ll suggest 10,000 people as a rough order-of-magnitude estimate. I’d be surprised if the number that came out of a more careful estimation process wasn’t within a factor of ten of that.
After discussing this offline, I think the main argument that I laid out does not hold up well in the case of blackmail (though it works better for many other kinds of threats). They key bit is here:
if Bob refuses and Alice carries out her threat then it is negative sum (Bob loses a lot and Alice loses something too)
This only looks at the effects on Alice and on Bob, as a simplification. But with blackmail “carrying out the threat” means telling other people information about Bob, and that is often useful for those other people. If Alice tells Casey something bad about Bob, that will often be bad for Bob but good for Casey. So it’s not obviously negative sum for the whole world.
There’s a pretty simple economic argument for why blackmail is bad: it involves a negative-sum threat rather than a positive-sum deal. I was surprised to not see this argument in the econbloggers’ discussion; good to see it come up here. To lay it out succinctly and separate from other arguments:
Ordinarily, when two people make a deal we can conclude that it’s win-win because both of them chose to make the deal rather than just not interacting with each other. By default Alice would just act on her own preferences and completely ignore Bob’s preferences, and the mirror image for Bob, but sometimes they find a deal where they each give up something in return for the other person doing something that they value even more. With some simplifying assumptions, the worst case scenario is that they don’t reach a deal and they both break even (compared to if they hadn’t interacted), and if they do reach a deal then they both wind up better off.
With a threat, Alice has an alternative course of action available which is somewhat worse for Alice than her default action but much worse for Bob, and Alice tells Bob that she will do the alternative action unless Bob does something for Alice. With some simplifying assumptions, if Bob agrees to give in then their interaction is zero-sum (Alice gets a transfer from Bob), if Bob refuses and Alice carries out her threat then it is negative sum (Bob loses a lot and Alice loses something too), and if Bob refuses and Alice backs down then it’s zero sum (both take the default action).
Ordinary deals add value to the world and threats subtract value from the world, and blackmail is a type of threat.
If we remove some simplifying assumptions (e.g. no transaction costs, one-shot interaction) then things get more complicated, but mostly in ways that make ordinary deals better and threats worse. In the long run deals bring people together as they seek more interactions which could lead to win-win deals, deals encourage people to invest in abilities that will make them more useful to other people so that they’ll have more/better opportunities to make deals, and the benefits of deals must outweigh the transaction costs & risks involved at least in expectation (otherwise people would just opt out of trying to make those deals). Whereas threats push people apart as they seek to avoid negative sum interactions, threats encourage people to invest in abilities that make them more able to harm other people, and transaction costs increase the badness of threats (turning zero sum interactions into negative sum) but don’t prevent those interactions unless they drive the threatmaker’s returns down far enough.
I think that there’s a spectrum between treating someone as a good source of conclusions and treating them as a good source of hypotheses.
I can have thoughts like “Carol looked closely into the topic and came away convinced that Y is true, so for now I’m going to act as if Y is probably true” if I take Carol to be a good source of conclusions.
Whereas if I took Alice to be a good source of hypotheses but not a good source of conclusions, then I would instead have thoughts like “Alice insists that Z is true, so Z seems like something that’s worth thinking about more.”
Giving someone epistemic tenure as a source of conclusions seems much more costly than giving them epistemic tenure as a source of hypotheses.
Huh? I am sufficiently surprised/confused by this example to want a citation.
Edit: The surprise/confusion was in reference to the pre-edit version of the above comment, and does not apply to the current edition.
I think we should take more care to separate the question of of whether AI developments will be decentralized with the question of whether decentralization is safer. It is not obvious to me whether a decentralized, economy-wide path to advanced AIs will be safer or riskier than a concentrated path within a single organization. It seems like the opening sentence of this question is carrying the assumption that decentralized is safer (“Robin Hanson has argued that those who believe AI Risk to be a primary concern for humanity, are suffering from a bias toward thinking that concentration of power is always more efficient than a decentralised system”).
I think you mean 50⁄62 = 0.81?
Sometimes theory can open up possibilities rather than closing them off. In these cases, once you have a theory that claims that X is important, then you can explore different values of X and do local hill-climbing. But before that it is difficult to explore by varying X, either because there are too many dimensions or because there is some subtlety in recognizing that X is a dimension and being able to vary its level.
This depends on being able to have and use a theory without believing it.
This sounds most similar to what LWers call generalizing from one example or the typical mind fallacy and to what psychologists call the false-consensus effect or egocentric bias.
Here are relatively brief responses on these 3 particular points; I’ve made a separate comment which lays out my thinking on metrics like the Big 5 which provides some context for these responses.
We have continued to collect measures like the ones in the 2015 longitudinal study. We are mainly analyzing them in large batches, rather than workshop to workshop, because the sample size isn’t big enough to distinguish signal from noise for single workshops. One of the projects that I’m currently working on is an analysis of a couple years of these data.
The 2017 impact report was not intended as a comprehensive account of all of CFAR’s metrics, it was just focused on CFAR’s EA impact. So it looked at the data that were most directly related to CFAR alums’ impact on the world, and “on average alums have some increase in conscientiousness” seemed less relevant than the information that we did include. The first few paragraphs of the report say more about this.
I’m curious why you’re especially interested in Raven’s Progressive Matrices. I haven’t looked closely at the literature on it, but my impression is that it’s one of many metrics which are loosely related to the thing that we mean by “rationality.” It has the methodological advantage of being a performance score rather than self-report (though this is partially offset by the possibility of practice effects and effort effects). The big disadvantage is the one that Kaj pointed to: it seems to track relatively stable aspects of a person’s thinking skills, and might not change much even if a person made large improvements. For instance, I could imagine a person developing MacGyver-level problem-solving ability while having little or no change in their Raven’s score.
Here’s a sketch of my thinking about the usefulness of metrics like the Big 5 for what CFAR is trying to do.
It would be convenient if there was a definitive measure of a person’s rationality which closely matched what we mean by the term and was highly sensitive to changes. But as far as I can tell there isn’t one, and there isn’t likely to be one anytime soon. So we rely on a mix of indicators, including some that are more like systematic metrics, some that are more like individuals’ subjective impressions, and some that are in between.
I think of the established psychology metrics (Big 5, life satisfaction, general self-efficacy, etc.) as primarily providing a sanity check on whether the workshop is doing something, along with a very very rough picture of some of what it is doing. They are quantitative measures that don’t rely on staff members’ subjective impressions of participants, they have been validated (at least to some extent) in existing psychology research, and they seem at least loosely related to the effects that CFAR hopes to have. And, compared to other ways of evaluating CFAR’s impact on individuals, they’re relatively easy for an outsider to make sense of.
A major limitation of these established psychology metrics is that they haven’t been that helpful as feedback loops. One of the main purposes of a metric is to provide input into CFAR’s day-to-day and workshop-to-workshop efforts to develop better techniques and refine the workshop. That is hard to do with metrics like the ones in the longitudinal study, because of a combination of a few factors:
The results aren’t available until several months after the workshop, which would make for very slow feedback loops and iteration.
The results are too noisy to tell if changes from one workshop to the next are just random variation. It takes several workshops worth of data to get a clear signal on most of the metrics.
These metrics are only loosely related to what we care about. If a change to the workshop leads to larger increases in conscientiousness that does not necessarily mean that we want to do it, and when a curriculum developer is working on a class they are generally not that interested in these particular metrics.
These metrics are relatively general/coarse indicators of the effect of the workshop as a whole, not tied to particular inputs. So (for example) if we make some changes to the TAPs class and want to see if the new version of the class works better or worse, there isn’t a metric that isolates the effects of the TAPs class from the rest of the workshop.
(This is Dan from CFAR)
CFAR’s 2015 Longitudinal Study measured the Big 5 and some other standard psychology metrics. It did find changes including decreased neuroticism and increased conscientiousness.
Seems interesting to get data on:
Some group that isn’t heavily selected for intelligence / intellectual curiosity: skateboarders, protestors, professional hockey players...
Some non-STEM group that is selected for success based on mental skills: literature laureates, governors, …
Not sure which groups it would be easy to get data on.
There is also the option of looking into existing research on birth order to see what groups other people have already looked at.
Seems worth noting that nostalgebraist published this post in June 2017, which was (for example) before Eliezer’s post on toolbox thinking.
Now that we have data on LWers/SSCers, mathematicians, and physicists, if anyone wants to put more work into this I’d like to see them look someplace different. We don’t want to fall into the Wason 2-4-6 trap of only looking for birth order effects among smart STEM folks. We want data that can distinguish Scott’s intelligence / intellectual curiosity hypothesis from other possibilities like some non-big-5 personality difference or a general firstborns more likely phenomenon.
For each mathematician, actual firstbornness was coded as 0 or 1, and expected firstbornness as 1/n (where n is the number of children that their parents had). Then we just did a paired t-test, which is equivalent to subtracting actual minus expected for each data point and then doing a one sample t-test against a mean of 0. You can see this all in Eli’s spreadsheet here; the data are also all there for you to try other statistical tests if you want to.
You could think of CEV applied to a single unitary agent as a special case where achieving coherence is trivial. It’s an edge case where the problem becomes easier, rather than an edge case where the concepts threaten to break.
Although this terminology makes it harder to talk about several agents who each separately have their own extrapolated volition (as you were trying to do in your original comment in this thread). Though replacing it with Personal Extrapolated Volition only helps a little, if we also want to talk about several separately groups who each have their own within-group extrapolated volition (which is coherent within each group but not between groups).
Looking at the math of dividing a fixed pool of resources among a non-fixed number of people, a feature of log(r) that matters a lot is that log(0)<0. The first unit of resources that you give to a person is essentially wasted, because it just gets them up to 0 utility (which is no better than just having 1 fewer person around).
That favors having fewer people, so that you don’t have to keep wasting that first unit of resource on each person. If the utility function for a person in terms of their resources was f(r)=r-1 you would similarly find that it is best not to have too many people (in that case having exactly 1 person would work best).
Whereas if it was f(r)=sqrt(r) then it would be best to have as many people as possible, because you’re starting from 0 utility at 0 resources and sqrt is steepest right near 0. Doing the calculation… if you have R units of resources divided equally among N people, the total utility is sqrt(RN). log(1+r) is similar to sqrt—it increases as N increases—but it is bounded if R is fixed and just approaches that bound (if we use natural log, that bound is just R).
To sum up: diminishing marginal utility favors having more people each with fewer resources (in addition to favoring equal distribution of resources), f(0)<0 favors having fewer people each with more resources (to avoid “wasting” the bit of resources that get a person up to 0 utility), and functions with both features like log(r) favor some intermediate solution with a moderate population size.
Total utilitarianism does imply the repugnant conclusion, very straightforwardly.
For example, imagine that world A has 1000000000000000000 people each with 10000000 utility and world Z has 10000000000000000000000000000000000000000 people each with 0.0000000001 utility. Which is better?
Total utilitarianism says that you just multiply. World A has 10^18 people x 10^7 utility per person = 10^25 total utility. World Z has 10^40 people x 10^-10 utility per person = 10^30 total utility. World Z is way better.
This seems repugnant; intuitively world Z is much worse than world A.
Parfit went through cleverer steps because he wanted his argument to apply more generally, not just to total utilitarianism. Even much weaker assumptions can get to this repugnant-seeming conclusion that a world like Z is better than a world like A.
The point is that lots of people are confused about axiology. When they try to give opinions about population ethics, judging in various scenarios whether one hypothetical world is better than another, they’ll wind up making judgments that are inconsistent with each other.
The paragraph that I was quoting from was just about diminishing marginal utility and equality/redistribution, not about the repugnant conclusion in particular.