Great founders and field-builders have multiplier effects on recruiting, training, and deploying talent to work on AI safety [...] If we want to 10-100x the AI safety field in the next 8 years, we need multiplicative capacity, not just marginal hires
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I’d guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have “paradoxical impact.”
I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it’s worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result.
It is hard to predict this, but I think we could have done better (and can do better in the future still).
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn’t made the problem known, since I roughly can’t imagine worlds where we do survive where the problem wasn’t made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won’t. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn’t what you mean exactly, and there are ways we could have made the problem known that seemed less like “freaking out”? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).
I spent much of 2018-2020 trying to help MIRI with recruiting at AIRCS workshops. At the time, I think AIRCS workshops and 80k were probably the most similar things the field had to MATS, and I decided to help with them largely because I was excited about the possibility of multiplier effects like these.
The single most obvious effect I had on a participant—i.e., where at the beginning of our conversations they seemed quite uninterested in working on AI safety, but by the end reported deciding to—was that a few months later they quit their (non-ML) job to work on capabilities at OpenAI, which they have been doing ever since.
Multiplier effects are real, and can be great; I think AIRCS probably had helpful multiplier effects too, and I’d guess the workshops were net positive overall. But much as pharmaceuticals often have paradoxical effect—i.e., to impact the intended system in roughly the intended way, except with the sign of the key effect flipped—it seems disturbingly common to have “paradoxical impact.”
I suspect the risk of paradoxical impact—even from your own work—is often substantial, especially in poorly understood domains. My favorite example of this is the career of Fritz Haber, who by discovering how to efficiently mass-produce fertilizer, explosives, and chemical weapons, seems plausibly to have both counterfactually killed and saved millions of lives.
But it’s even harder to predict the sign when the impact in question is on other people—e.g., on their choice of career—since you have limited visibility into their reasoning or goals, and nearly zero control over what actions they choose to take as a result. So I do think it’s worth being fairly paranoid about this in high-stakes, poorly-understood domains, and perhaps especially so in AI safety, where numerous such skulls have already appeared.
It is hard to predict this, but I think we could have done better (and can do better in the future still).
That may be, but personally I am unpersuaded that the observed paradoxical impacts should update us that the world would have been better off if we hadn’t made the problem known, since I roughly can’t imagine worlds where we do survive where the problem wasn’t made known, and I think it should be pretty expected with a problem this confusing that initially people will have little idea how to help, and so many initial attempts won’t. In my imagination, at least, basically all surviving worlds look like that at first, but then eventually people who were persuaded to worry about the problem do figure out how to solve it.
(Maybe this isn’t what you mean exactly, and there are ways we could have made the problem known that seemed less like “freaking out”? But to me this seems hard to achieve, when the problem in question is the plausibly relatively imminent death of everyone).
I like the phrase “paradoxical impact”.
I feel considerations around paradoxical impact are a big part of my wworld model and I wowould like to see more discussion about it