Dario says he’d “go out there saying that everyone should stop building [AI]” if safety techniques do not progress alongside capabilities.
He is just saying “if we made no progress on safety, then we should slow down”, but he clearly expects progress on safety, and so doesn’t expect to need a slowdown of any kind. It’s a pretty weak statement, though I am still glad to hear that he doesn’t think he could just literally scale to Superintelligence right now with zero progress on safety (though that’s a low bar).
The idea that we can kind of logically prove that there’s no way to make them safe, that seems like nonsense to me.
Also this is just such a totally random strawman. There is approximately no one who believes these systems can be logically proven to have no way of making them safe. This characterization of “doomer” really has no basis in reality. It’s a hard problem. We don’t know how hard. It’s pretty clear it’s somehow solvable, but it might require going much slower and being much more careful than we are going right now.
I don’t think he was being careful with the phrase “logically prove”, but in the context of speaking extemporaneously in an interview, I’m inclined to cut him some slack on that. It’s not “totally random”; the point is that Amodei thinks that Anthropic-style safety research on “these models” can succeed, in contrast to how e.g. Yudkowsky thinks the entire deep learning paradigm is itself unsafe (see List of Lethalities #16–19).
The central thing he is trying to do in that sentence is to paint one position as extreme in order to justify dismissing it. The sentence “The idea that we can confidently argue that there is no way to make these systems safe by just continuing to do the same kind of research we’ve already been doing at Anthropic, that seems like nonsense to me” is a sentence with a hugely different connotative and semantic content that would not remotely land the same way.
The whole thing he is saying there is dependent on him using extreme language, and that extreme language only really lands if you use the “kind of logically prove” construction.
(Edit: Rephrased the hypothetical quote a bit since what I originally said was a bit muddled)
OK, I think I see it: the revised sentence “The idea that we can confidently argue that there’s no way to make them safe, that seems like nonsense to me” wouldn’t land, because it’s more natural to say that the arguments are nonsense (or “gobbledegook”) than that the idea of such arguments is nonsense.
I definitely don’t recognize Eliezer in Dario’s characterization. My best guess is he’s referring to Roman Yampolskiy, who was recently on Rogan, and who has published results he claims prove alignment is impossible here: https://arxiv.org/abs/2109.00484
It would be both more timely and more accurate for Dario to be referring to Yampolskiy.
Some of the protest groups make similar claims that superintelligence can never be safe and so should never be developed. This is actually very distant from Eliezer’s position.
That would be my interpretation if I were to steelman him. My actual expectation is that he’s lumping Eliezer-style positions with Yampolskiy-style positions, barely differentiating between them. Eliezer has certainly said things along the general lines of “AGI can never be made aligned using the tools of the current paradigm”, backing it up by what could be called “logical arguments” from evolution or first principles.
Like, Dario clearly disagrees with Eliezer’s position as well, given who he is and what he is doing, so there must be some way he is dismissing it. And he is talking about “doomers” there, in general, yet Yampolskiy and Yampolskiy-style views are not the central AGI-doomsayer position. So why would he be talking about his anti-Yampolskiy views in the place where he should be talking about his anti-Eliezer views?
My guess is that it’s because those views are one and the same. Alternatively, he deliberately chose to associate general AGI-doom arguments with a weak-man position he could dunk on, in a way that leaves him the opportunity to retreat to the motte of “I actually meant Yampolskiy’s views, oops, sorry for causing a misunderstanding”. Not sure which is worse.
Yes, his statement is clearly nonsensical if we read it as a dismissal of Eliezer’s position, but it sure sounded, in-context, like he would’ve been referring to Eliezer’s position there. So I expect the nonsense is because he’s mischaracterizing (deliberately or not) that position; I’m not particularly inclined to search for complicated charitable interpretations.
I agree that Dario disagrees with Eliezer somewhere. I don’t know for sure that you’ve isolated the part that Dario disagrees with, and it seems plausible to me that Dario thinks we need some more MIRI-esque, principled thing, or an alternative architecture altogether, or for the LLMs to have solved the problem for us, once we cross some capabilities threshold. If he’s said something public about this either way, I’d love to know.
I also think that some interpretations of Dario’s statement are compatible with some interpretations of the section of the IABIED book excerpt above, so we ought to just… all be extra careful not to be too generous to one side or the other, or too critical of one side or the other. I agree that my interpretation errs on the side of giving Dario too much credit here.
I’m pretty confused about Dario and don’t trust him, but I want to gesture toward some care in the intended targets of some of his stronger statements about ‘doomers’. I think he’s a pretty careful communicator, and still lean toward my interpretation over yours (although I also expect him to be wrong in his characterization of Eliezer’s beliefs, I don’t expect him to be quite as wrong as the above).
I find the story you’re telling here totally plausible, and just genuinely do not know.
There’s also a meta concern where if you decide that you’re the target of some inaccurate statement that’s certainly targeted at someone but might not be tarted at you, you’ve perhaps done more damage to yourself by adopting that mischaracterization of yourself in order to amend it, than by saying something like “Well, you must not be talking about me, because that’s just not what I believe.”
I think there might exist people who feel that way (e.g. reactors above) but Yudkowsky/Soares, the most prominent doomers (?), are on the record saying they think alignment is in principle possible, e.g. opening paragraphs of List of Lethalities. It feels like a disingenuous strawman to me for Dario to dismiss doomers with.
Coming back to Amodei’s quote, he says (my emphasis):
The idea that these models have dangers associated with them, including dangers to humanity as a whole, that makes sense to me. The idea that we can kind of logically prove that there’s no way to make them safe, that seems like nonsense to me.
So “them” in “there’s no way to make them safe” refers to LLMs, not to all possible AGI methods. Yudkowsky-2022 in List of Lethalities does indeed claim that AGI alignment is in principle possible, but doesn’t claim that AGI-LLM alignment is in principle possible. In the section you link, he wrote:
The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simple ideas that actually work robustly in practice, we could probably build an aligned superintelligence in six months.
My mainline interpretation is that LLMs are not a “simple idea that actually works robustly in practice”, and the imagined textbook from the future would contain different ideas instead. List of Lethalities isn’t saying that AGI-LLM alignment is impossible, but also isn’t saying that it is possible.
(still arguably hyperbole to say “kind of logically prove”)
the entire deep learning paradigm is itself unsafe
yudkowsky is such a goofball about deep learning. a thing I believe: the strongest version of alignment, where there is no step during the training process that ever produces any amount of misaligned cognition whatsoever, if it’s possible to do at all, is possible to do with deep learning. I also think it’s not significantly harder to do with deep learning than some other way. And I think it’s possible to do at all. Justification post pending me convincing myself to write a bad post rather than no post, and/or someone asking me questions that make me write down things that clarify this. if someone wanted to grill me in an lw dialogue I’d be down.
I don’t know enough about the subject matter to grill you in detail, but I’d certainly love to see a post about this. (Or even a long comment.) The obvious big questions are “why do you believe that” but also “how can you possibly know that”—after all, who knows what AI-related techniques and technologies remain undiscovered? Surely you can’t know whether some of them make it easier to produce aligned AIs than deep learning…?
He is just saying “if we made no progress on safety, then we should slow down”, but he clearly expects progress on safety, and so doesn’t expect to need a slowdown of any kind. It’s a pretty weak statement, though I am still glad to hear that he doesn’t think he could just literally scale to Superintelligence right now with zero progress on safety (though that’s a low bar).
Also this is just such a totally random strawman. There is approximately no one who believes these systems can be logically proven to have no way of making them safe. This characterization of “doomer” really has no basis in reality. It’s a hard problem. We don’t know how hard. It’s pretty clear it’s somehow solvable, but it might require going much slower and being much more careful than we are going right now.
I think Amodei was trying to refer to the belief that “[i]f any company or group [...] builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.”
I don’t think he was being careful with the phrase “logically prove”, but in the context of speaking extemporaneously in an interview, I’m inclined to cut him some slack on that. It’s not “totally random”; the point is that Amodei thinks that Anthropic-style safety research on “these models” can succeed, in contrast to how e.g. Yudkowsky thinks the entire deep learning paradigm is itself unsafe (see List of Lethalities #16–19).
I… disagree?
The central thing he is trying to do in that sentence is to paint one position as extreme in order to justify dismissing it. The sentence “The idea that we can confidently argue that there is no way to make these systems safe by just continuing to do the same kind of research we’ve already been doing at Anthropic, that seems like nonsense to me” is a sentence with a hugely different connotative and semantic content that would not remotely land the same way.
The whole thing he is saying there is dependent on him using extreme language, and that extreme language only really lands if you use the “kind of logically prove” construction.
(Edit: Rephrased the hypothetical quote a bit since what I originally said was a bit muddled)
OK, I think I see it: the revised sentence “The idea that we can confidently argue that there’s no way to make them safe, that seems like nonsense to me” wouldn’t land, because it’s more natural to say that the arguments are nonsense (or “gobbledegook”) than that the idea of such arguments is nonsense.
I definitely don’t recognize Eliezer in Dario’s characterization. My best guess is he’s referring to Roman Yampolskiy, who was recently on Rogan, and who has published results he claims prove alignment is impossible here: https://arxiv.org/abs/2109.00484
It would be both more timely and more accurate for Dario to be referring to Yampolskiy.
Some of the protest groups make similar claims that superintelligence can never be safe and so should never be developed. This is actually very distant from Eliezer’s position.
That would be my interpretation if I were to steelman him. My actual expectation is that he’s lumping Eliezer-style positions with Yampolskiy-style positions, barely differentiating between them. Eliezer has certainly said things along the general lines of “AGI can never be made aligned using the tools of the current paradigm”, backing it up by what could be called “logical arguments” from evolution or first principles.
Like, Dario clearly disagrees with Eliezer’s position as well, given who he is and what he is doing, so there must be some way he is dismissing it. And he is talking about “doomers” there, in general, yet Yampolskiy and Yampolskiy-style views are not the central AGI-doomsayer position. So why would he be talking about his anti-Yampolskiy views in the place where he should be talking about his anti-Eliezer views?
My guess is that it’s because those views are one and the same. Alternatively, he deliberately chose to associate general AGI-doom arguments with a weak-man position he could dunk on, in a way that leaves him the opportunity to retreat to the motte of “I actually meant Yampolskiy’s views, oops, sorry for causing a misunderstanding”. Not sure which is worse.
Yes, his statement is clearly nonsensical if we read it as a dismissal of Eliezer’s position, but it sure sounded, in-context, like he would’ve been referring to Eliezer’s position there. So I expect the nonsense is because he’s mischaracterizing (deliberately or not) that position; I’m not particularly inclined to search for complicated charitable interpretations.
I agree that Dario disagrees with Eliezer somewhere. I don’t know for sure that you’ve isolated the part that Dario disagrees with, and it seems plausible to me that Dario thinks we need some more MIRI-esque, principled thing, or an alternative architecture altogether, or for the LLMs to have solved the problem for us, once we cross some capabilities threshold. If he’s said something public about this either way, I’d love to know.
I also think that some interpretations of Dario’s statement are compatible with some interpretations of the section of the IABIED book excerpt above, so we ought to just… all be extra careful not to be too generous to one side or the other, or too critical of one side or the other. I agree that my interpretation errs on the side of giving Dario too much credit here.
I’m pretty confused about Dario and don’t trust him, but I want to gesture toward some care in the intended targets of some of his stronger statements about ‘doomers’. I think he’s a pretty careful communicator, and still lean toward my interpretation over yours (although I also expect him to be wrong in his characterization of Eliezer’s beliefs, I don’t expect him to be quite as wrong as the above).
I find the story you’re telling here totally plausible, and just genuinely do not know.
There’s also a meta concern where if you decide that you’re the target of some inaccurate statement that’s certainly targeted at someone but might not be tarted at you, you’ve perhaps done more damage to yourself by adopting that mischaracterization of yourself in order to amend it, than by saying something like “Well, you must not be talking about me, because that’s just not what I believe.”
I think there might exist people who feel that way (e.g. reactors above) but Yudkowsky/Soares, the most prominent doomers (?), are on the record saying they think alignment is in principle possible, e.g. opening paragraphs of List of Lethalities. It feels like a disingenuous strawman to me for Dario to dismiss doomers with.
Coming back to Amodei’s quote, he says (my emphasis):
So “them” in “there’s no way to make them safe” refers to LLMs, not to all possible AGI methods. Yudkowsky-2022 in List of Lethalities does indeed claim that AGI alignment is in principle possible, but doesn’t claim that AGI-LLM alignment is in principle possible. In the section you link, he wrote:
My mainline interpretation is that LLMs are not a “simple idea that actually works robustly in practice”, and the imagined textbook from the future would contain different ideas instead. List of Lethalities isn’t saying that AGI-LLM alignment is impossible, but also isn’t saying that it is possible.
(still arguably hyperbole to say “kind of logically prove”)
yudkowsky is such a goofball about deep learning. a thing I believe: the strongest version of alignment, where there is no step during the training process that ever produces any amount of misaligned cognition whatsoever, if it’s possible to do at all, is possible to do with deep learning. I also think it’s not significantly harder to do with deep learning than some other way. And I think it’s possible to do at all. Justification post pending me convincing myself to write a bad post rather than no post, and/or someone asking me questions that make me write down things that clarify this. if someone wanted to grill me in an lw dialogue I’d be down.
I don’t know enough about the subject matter to grill you in detail, but I’d certainly love to see a post about this. (Or even a long comment.) The obvious big questions are “why do you believe that” but also “how can you possibly know that”—after all, who knows what AI-related techniques and technologies remain undiscovered? Surely you can’t know whether some of them make it easier to produce aligned AIs than deep learning…?