Thanks to Mikhail Samin for writing one of the most persuasive and important articles I’ve read on LessWrong.
I think a lot of the dubious, skeptical, or hostile comments on this post reflect some profound cognitive dissonance.
Rationalists and EAs generally were very supportive of OpenAI at first, and 80,000 Hours encouraged people to work there; then OpenAI betrayed our trust and violated most of the safety commitments that they made. So, we were fooled once.
Then, Rationalists and EAs were generally very supportive of Anthropic, and 80,000 Hours encouraged well-meaning people to work there; then Anthropic betrayed our trust and violated most of the safety commitments they made. So, we were fooled twice. Which is embarrassing, and we find ways to cope with our embarrassment, gullibility, and naivete.
What’s the lesson from OpenAI and Anthropic betraying our trust so massively and recklessly?
The lesson is simply about human nature. People are willing to sell their souls. A mid-level hit man is willing to kill someone for about $50,000. A cyber-scammer is willing to defraud thousands of elderly people for a few million dollars. Sam Altman was willing to betray AI Safety to achieve a current net worth of (allegedly) about $2.1 billion. Dario Amodei was willing to betray AI Safety to achieve his current net worth of (allegedly) about $3.7 billion. If the AI bubble doesn’t burst soon, they’ll each probably be worth over $10 billion within a couple of years.
So, we should have expected that almost anyone, no matter how well-meaning and principled, would eventually succumb to the greed, hubris, and thrills of trying to build Artificial Superintelligence. We like to think that we’d never sell our souls or compromise our principles for $10 billion. But millions of humans compromise their principles, every day, for much, much less than that.
Why exactly did we think Sam Altman or Dario Amodei would be any different? Because they were ‘friendlies’? Allies to the Rationalist cause? EA-adjacent? Long-termists who cared about the future?
None of that matters to ordinary humans when they’re facing the prospect of winning billions of dollars—and all they have to do is a bit of rationalization and self-deception, get some social validation from naive worshippers/employees, and tap into that inner streak of sociopathy that is latent in most of us.
In other words, Anthropic’s utter betrayal of Rationalists, and EAs, and humanity, should have been one of the least surprising developments in the entire tech industry. Instead, here we are, trading various copes and excuses for this company’s rapid descent from ‘probably well-intentioned’ to ‘shamelessly evil’.
we should have expected that almost anyone, no matter how well-meaning and principled, would eventually succumb to the greed, hubris, and thrills of trying to build Artificial Superintelligence.
I don’t think this is at all true. I think most people would not do that. I think those company heads are pretty exceptional (but probably not extremely exceptional).
Whether I’m correct or incorrect about that, I think this is a relevant question because if it is exceptional, then it implies a lot of stuff. For example:
it implies that actually maybe you can mostly convince most people to not do AGI capabilities research;
it implies that actually maybe you can avoid investing community resources (money, talent, public support, credibility, etc.) into people who would do this—but you would have to be better at detecting them;
we aren’t de facto good enough at detecting such people (“detecting” broadly construed to include creating mutual knowledge of that, common knowledge, etc.).
TsviBT—I can’t actually follow what you’re saying here. Could you please rephrase a little more directly and clearly? I’d like to understand your point. Thanks!
I don’t think this is at all true. I think most people would not do that. I think those company heads are pretty exceptional (but probably not extremely exceptional).
In this part I’m disagreeing with what I understand to be your proposed explanation for the situation. I think you’re trying to explain why “we” (Rationalists and EAs) were fooled by e.g. Sam and Dario (and you’re suggesting an update we should make, and other consequences). I think your explanation is that we did not understand that of course leaders of AI companies would pursue AI because almost anybody would in that position. I agree that “we” are under-weighting “people are just fine with risking everyone’s lives because of greed, hubris, and thrills”, and I personally don’t know how to update / what to believe about that, but I don’t think the answer is actually “most people would do the same”. I don’t think most people would e.g. lead a big coordinated PR campaign to get lots of young people to smoke lots of cigarettes, because they wouldn’t want to hurt people. (I don’t think this is obvious; most people also couldn’t do that, so it’s hard to tell.)
I’m disagreeing and saying many people would not do that.
Whether I’m correct or incorrect about that, I think this is a relevant question because
Then I’m explaining some of why I care about whether your explanation is correct.
Thanks to Mikhail Samin for writing one of the most persuasive and important articles I’ve read on LessWrong.
I think a lot of the dubious, skeptical, or hostile comments on this post reflect some profound cognitive dissonance.
Rationalists and EAs generally were very supportive of OpenAI at first, and 80,000 Hours encouraged people to work there; then OpenAI betrayed our trust and violated most of the safety commitments that they made. So, we were fooled once.
Then, Rationalists and EAs were generally very supportive of Anthropic, and 80,000 Hours encouraged well-meaning people to work there; then Anthropic betrayed our trust and violated most of the safety commitments they made. So, we were fooled twice. Which is embarrassing, and we find ways to cope with our embarrassment, gullibility, and naivete.
What’s the lesson from OpenAI and Anthropic betraying our trust so massively and recklessly?
The lesson is simply about human nature. People are willing to sell their souls. A mid-level hit man is willing to kill someone for about $50,000. A cyber-scammer is willing to defraud thousands of elderly people for a few million dollars. Sam Altman was willing to betray AI Safety to achieve a current net worth of (allegedly) about $2.1 billion. Dario Amodei was willing to betray AI Safety to achieve his current net worth of (allegedly) about $3.7 billion. If the AI bubble doesn’t burst soon, they’ll each probably be worth over $10 billion within a couple of years.
So, we should have expected that almost anyone, no matter how well-meaning and principled, would eventually succumb to the greed, hubris, and thrills of trying to build Artificial Superintelligence. We like to think that we’d never sell our souls or compromise our principles for $10 billion. But millions of humans compromise their principles, every day, for much, much less than that.
Why exactly did we think Sam Altman or Dario Amodei would be any different? Because they were ‘friendlies’? Allies to the Rationalist cause? EA-adjacent? Long-termists who cared about the future?
None of that matters to ordinary humans when they’re facing the prospect of winning billions of dollars—and all they have to do is a bit of rationalization and self-deception, get some social validation from naive worshippers/employees, and tap into that inner streak of sociopathy that is latent in most of us.
In other words, Anthropic’s utter betrayal of Rationalists, and EAs, and humanity, should have been one of the least surprising developments in the entire tech industry. Instead, here we are, trading various copes and excuses for this company’s rapid descent from ‘probably well-intentioned’ to ‘shamelessly evil’.
I don’t think this is at all true. I think most people would not do that. I think those company heads are pretty exceptional (but probably not extremely exceptional).
Whether I’m correct or incorrect about that, I think this is a relevant question because if it is exceptional, then it implies a lot of stuff. For example:
it implies that actually maybe you can mostly convince most people to not do AGI capabilities research;
it implies that actually maybe you can avoid investing community resources (money, talent, public support, credibility, etc.) into people who would do this—but you would have to be better at detecting them;
we aren’t de facto good enough at detecting such people (“detecting” broadly construed to include creating mutual knowledge of that, common knowledge, etc.).
TsviBT—I can’t actually follow what you’re saying here. Could you please rephrase a little more directly and clearly? I’d like to understand your point. Thanks!
In this part I’m disagreeing with what I understand to be your proposed explanation for the situation. I think you’re trying to explain why “we” (Rationalists and EAs) were fooled by e.g. Sam and Dario (and you’re suggesting an update we should make, and other consequences). I think your explanation is that we did not understand that of course leaders of AI companies would pursue AI because almost anybody would in that position. I agree that “we” are under-weighting “people are just fine with risking everyone’s lives because of greed, hubris, and thrills”, and I personally don’t know how to update / what to believe about that, but I don’t think the answer is actually “most people would do the same”. I don’t think most people would e.g. lead a big coordinated PR campaign to get lots of young people to smoke lots of cigarettes, because they wouldn’t want to hurt people. (I don’t think this is obvious; most people also couldn’t do that, so it’s hard to tell.)
I’m disagreeing and saying many people would not do that.
Then I’m explaining some of why I care about whether your explanation is correct.