Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety [...] If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I’d guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have “paradoxical impact.”
I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it’s worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.
I’m sorry to hear about your paradoxical impact; this sounds tough and it’s a fear I share. I feel a bit better about MATS’ impact because very few of our alumni work on AI capabilities at frontier labs (~2% by my estimation) and very few work at OpenAI altogether, but I can understand if you feel that the 22% working on AI safety at for-profit companies are primarily doing “safetywashing” or something (on net I disagree, but it’s a valid concern).
I think there is something for me to learn from your experience: at the time MIRI was running AIRCS, OpenAI was not an AI safety pariah; it’s possible that some of the companies that MATS alums join now will become pariahs in future, revealing paradoxical impact. I’m not sure what to do about this other than encourage people to be intentional with their careers, question assumptions, and “don’t do evil” (the MATS values are impact first, scout mindset, reasoning transparency, and servant leadership). I think that AI safety has to scale to have a chance at solving alignment in time; this means that some people will end up working on counter-productive things. I can understand if your risk tolerance is different than mine, or you are more skeptical about the impact of MATS or the founders who might be inspired by my post.
I do think I’d feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn’t really mean to express a claim here about the overall sign of MATS; I know little about the program.
Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people’s whole careers, which is wild; I think it’s easy to underestimate how valuable this can be. But I think it’s similarly easy to underestimate their risk, given that the source of this leverage—that you’re investing relatively little time getting to know them, etc, relative to the time they’ll spend doing… something as a result—also means you have unusually limited visibility into what the effects will be.
Given this, I think it’s worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.
Edit: I mistakenly said “27% at frontier labs” when I should have said “27% at for-profit companies”. Also, note that this is 27% of those working on AI safety (80%), so 22% of all alumni.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result.
It is hard to predict this, but I think we could have done better (and can do better in the future still).
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn’t made the problem known, since I roughly can’t imagine worlds where we do survive where the problem wasn’t made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won’t. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn’t what you mean exactly, and there are ways we could have made the problem known that seemed less like “freaking out”? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).
My question is, how do you make AI risk known while minimizing the risk of paradoxical impacts? “Never talk about it” is the wrong answer, but I expect there’s a way to do better than we’ve done so far. This seems like an important thing to try to understand.
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I’d guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have “paradoxical impact.”
I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it’s worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.
I like the phrase “paradoxical impact”.
I feel considerations around paradoxical impact are a big part of my wworld model and I wowould like to see more discussion about it
See my post on pessimization.
I’m sorry to hear about your paradoxical impact; this sounds tough and it’s a fear I share. I feel a bit better about MATS’ impact because very few of our alumni work on AI capabilities at frontier labs (~2% by my estimation) and very few work at OpenAI altogether, but I can understand if you feel that the 22% working on AI safety at for-profit companies are primarily doing “safetywashing” or something (on net I disagree, but it’s a valid concern).
I think there is something for me to learn from your experience: at the time MIRI was running AIRCS, OpenAI was not an AI safety pariah; it’s possible that some of the companies that MATS alums join now will become pariahs in future, revealing paradoxical impact. I’m not sure what to do about this other than encourage people to be intentional with their careers, question assumptions, and “don’t do evil” (the MATS values are impact first, scout mindset, reasoning transparency, and servant leadership). I think that AI safety has to scale to have a chance at solving alignment in time; this means that some people will end up working on counter-productive things. I can understand if your risk tolerance is different than mine, or you are more skeptical about the impact of MATS or the founders who might be inspired by my post.
I do think I’d feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn’t really mean to express a claim here about the overall sign of MATS; I know little about the program.
Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people’s whole careers, which is wild; I think it’s easy to underestimate how valuable this can be. But I think it’s similarly easy to underestimate their risk, given that the source of this leverage—that you’re investing relatively little time getting to know them, etc, relative to the time they’ll spend doing… something as a result—also means you have unusually limited visibility into what the effects will be.
Given this, I think it’s worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.
Edit: I mistakenly said “27% at frontier labs” when I should have said “27% at for-profit companies”. Also, note that this is 27% of those working on AI safety (80%), so 22% of all alumni.
It is hard to predict this, but I think we could have done better (and can do better in the future still).
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn’t made the problem known, since I roughly can’t imagine worlds where we do survive where the problem wasn’t made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won’t. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn’t what you mean exactly, and there are ways we could have made the problem known that seemed less like “freaking out”? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).
My question is, how do you make AI risk known while minimizing the risk of paradoxical impacts? “Never talk about it” is the wrong answer, but I expect there’s a way to do better than we’ve done so far. This seems like an important thing to try to understand.