I’m sorry to hear about your paradoxical impact; this sounds tough and it’s a fear I share. I feel a bit better about MATS’ impact because very few of our alumni work on AI capabilities at frontier labs (~2% by my estimation) and very few work at OpenAI altogether, but I can understand if you feel that the 22% working on AI safety at for-profit companies are primarily doing “safetywashing” or something (on net I disagree, but it’s a valid concern).
I think there is something for me to learn from your experience: at the time MIRI was running AIRCS, OpenAI was not an AI safety pariah; it’s possible that some of the companies that MATS alums join now will become pariahs in future, revealing paradoxical impact. I’m not sure what to do about this other than encourage people to be intentional with their careers, question assumptions, and “don’t do evil” (the MATS values are impact first, scout mindset, reasoning transparency, and servant leadership). I think that AI safety has to scale to have a chance at solving alignment in time; this means that some people will end up working on counter-productive things. I can understand if your risk tolerance is different than mine, or you are more skeptical about the impact of MATS or the founders who might be inspired by my post.
I do think I’d feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn’t really mean to express a claim here about the overall sign of MATS; I know little about the program.
Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people’s whole careers, which is wild; I think it’s easy to underestimate how valuable this can be. But I think it’s similarly easy to underestimate their risk, given that the source of this leverage—that you’re investing relatively little time getting to know them, etc, relative to the time they’ll spend doing… something as a result—also means you have unusually limited visibility into what the effects will be.
Given this, I think it’s worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.
Edit: I mistakenly said “27% at frontier labs” when I should have said “27% at for-profit companies”. Also, note that this is 27% of those working on AI safety (80%), so 22% of all alumni.
I’m sorry to hear about your paradoxical impact; this sounds tough and it’s a fear I share. I feel a bit better about MATS’ impact because very few of our alumni work on AI capabilities at frontier labs (~2% by my estimation) and very few work at OpenAI altogether, but I can understand if you feel that the 22% working on AI safety at for-profit companies are primarily doing “safetywashing” or something (on net I disagree, but it’s a valid concern).
I think there is something for me to learn from your experience: at the time MIRI was running AIRCS, OpenAI was not an AI safety pariah; it’s possible that some of the companies that MATS alums join now will become pariahs in future, revealing paradoxical impact. I’m not sure what to do about this other than encourage people to be intentional with their careers, question assumptions, and “don’t do evil” (the MATS values are impact first, scout mindset, reasoning transparency, and servant leadership). I think that AI safety has to scale to have a chance at solving alignment in time; this means that some people will end up working on counter-productive things. I can understand if your risk tolerance is different than mine, or you are more skeptical about the impact of MATS or the founders who might be inspired by my post.
I do think I’d feel very alarmed by the 27% figure in your position—much more alarmed than e.g. I am about what happened with AIRCS, which seems to me to have failed more in the direction of low than actively bad impact—but to be clear I didn’t really mean to express a claim here about the overall sign of MATS; I know little about the program.
Rather, my point is just that multiplier effects are scary for much the same reason they are exciting—they are in effect low-information, high-leverage bets. Sometimes single conversations can change the course of highly effective people’s whole careers, which is wild; I think it’s easy to underestimate how valuable this can be. But I think it’s similarly easy to underestimate their risk, given that the source of this leverage—that you’re investing relatively little time getting to know them, etc, relative to the time they’ll spend doing… something as a result—also means you have unusually limited visibility into what the effects will be.
Given this, I think it’s worth taking unusual care, when pursuing multiplier effect strategies, to model the overall relative symmetry of available risks/rewards in the domain. For example, whether A) there might be lemons market problems, such that those who are easiest to influence (especially quickly) might tend all else equal to be more strategically confused/confusable, or B) whether there might in fact currently be more easy ways to make AI risk worse than better, etc.
Edit: I mistakenly said “27% at frontier labs” when I should have said “27% at for-profit companies”. Also, note that this is 27% of those working on AI safety (80%), so 22% of all alumni.