Yann’s core argument for why AGI safety is easy is interesting, and actually echoes ongoing AGI safety research. I’ll paraphrase his list of five reasons that things will go well if we’re not “ridiculously stupid”:
We’ll give AGIs non-open-ended objectives like fetching coffee. These are task-limited and therefore there’s no more instrumental subgoals after the task is complete.
We will put “simple terms in the objective” to prevent obvious problems, presumably things like “don’t harm people”, “don’t violate laws”, etc.
We will put in “a mechanism” to edit the objective upon observing bad behavior;
We can physically destroy a computer housing AGI;
We can build a second AGI whose sole purpose is to destroy the first AGI if the first AGI has gotten out of control, and the latter will succeed because it’s more specialized.
All of these are reasonable ideas on their face, and indeed they’re similar to ongoing AGI safety research programs: (1) is myopic or task-limited AGI, (2) is related to AGI limiting and norm-following, (3) is corrigibility, (4) is boxing, and (5) is in the subfield of AIs-helping-with-AGI-safety (other things in this area include IDA, adversarial testing, recursive reward modeling, etc.).
The problem, of course, is that all five of these things, when you look at them carefully, are much harder and more complicated than they appear, and/or less likely to succeed. And meanwhile he’s discouraging people from doing the work to solve those problems.. :-(
I don’t know that his arguments “echo”, it’s more like “can be translated into existing discourse”. For example, the leap from his 5) to IDA is massive, and I don’t understand why he imagines tackling the “we can’t align AGIs” problem with “build another AGI to stop the bad AGI”.
I think 5 is much closer to the “look, the first goal is to build a system that prevents anyone else from building unaligned AGI” claim, and there’s a separate claim 6 of the form “more generally, we can use AGI to police AGI” that is similar to debate or IDA. And I think claim 5 is basically in line with what, say, Bostrom would discuss (where stabilization is a thing to do before we attempt to build a sovereign).
And I think claim 5 is basically in line with what, say, Bostrom would discuss (where stabilization is a thing to do before we attempt to build a sovereign).
You mean in the sense of stabilizing the whole world? I’d be surprised if that’s what Yann had in mind. I took him just to mean building a specialized AI to be a check on a single other AI.
the defensive AI systems designed to protect against rogue AI systems are not akin to the military, they are akin to the police, to law enforcement. Their “jurisdiction” would be strictly AI systems, not humans.
To be clear, I think he would mean it more in the way that there’s currently an international police order that is moderately difficult to circumvent, and that the same would be true for AGI, and not necessarily the more intense variants of stabilization (which are necessarily primarily if you think offense is highly advantaged over defense, which I don’t know his opinion on).
I downvoted TAG’s comment because I found it confusing/misleading. I can’t tell which of these things TAG’s trying to do:
Assert, in a snarky/indirect way, that people agitating about AI safety have no overlap with AI researchers. This seems doubly weird in a conversation with Stuart Russell.
Suggest that LeCun believes this. (??)
Assert that LeCun doesn’t mean to discourage Russell’s research. (But the whole conversation seems to be about what kind of research people should be doing when in order to get good outcomes from AI.)
I downvoted TAG’s comment because I found it confusing/misleading.
You could have asked for clarification. The point is that Yudkowsky’s early movement was disjoint from actual AI research, and during that period a bunch of dogmas and approaches became solidified, which a lot of AI researchers (Russell is an exception) find incomprehensible or misguided. In other words, you can disapprove of amateur AI safety without dismissing AI safety wholesale.
It seems like “amateur” AI safety researchers have been the main ones willing to seriously think about AGI and on-the-horizon advanced AI systems from a safety angle though.
However, I do think you’re pointing to a key potential blindspot in the AI safety community. Fortunately AI safety folks are studying ML more, and I think ML researchers are starting to be more receptive to discussions about AGI and safety. So this may become a moot point.
Yann’s core argument for why AGI safety is easy is interesting, and actually echoes ongoing AGI safety research. I’ll paraphrase his list of five reasons that things will go well if we’re not “ridiculously stupid”:
We’ll give AGIs non-open-ended objectives like fetching coffee. These are task-limited and therefore there’s no more instrumental subgoals after the task is complete.
We will put “simple terms in the objective” to prevent obvious problems, presumably things like “don’t harm people”, “don’t violate laws”, etc.
We will put in “a mechanism” to edit the objective upon observing bad behavior;
We can physically destroy a computer housing AGI;
We can build a second AGI whose sole purpose is to destroy the first AGI if the first AGI has gotten out of control, and the latter will succeed because it’s more specialized.
All of these are reasonable ideas on their face, and indeed they’re similar to ongoing AGI safety research programs: (1) is myopic or task-limited AGI, (2) is related to AGI limiting and norm-following, (3) is corrigibility, (4) is boxing, and (5) is in the subfield of AIs-helping-with-AGI-safety (other things in this area include IDA, adversarial testing, recursive reward modeling, etc.).
The problem, of course, is that all five of these things, when you look at them carefully, are much harder and more complicated than they appear, and/or less likely to succeed. And meanwhile he’s discouraging people from doing the work to solve those problems.. :-(
I don’t know that his arguments “echo”, it’s more like “can be translated into existing discourse”. For example, the leap from his 5) to IDA is massive, and I don’t understand why he imagines tackling the “we can’t align AGIs” problem with “build another AGI to stop the bad AGI”.
I think 5 is much closer to the “look, the first goal is to build a system that prevents anyone else from building unaligned AGI” claim, and there’s a separate claim 6 of the form “more generally, we can use AGI to police AGI” that is similar to debate or IDA. And I think claim 5 is basically in line with what, say, Bostrom would discuss (where stabilization is a thing to do before we attempt to build a sovereign).
You mean in the sense of stabilizing the whole world? I’d be surprised if that’s what Yann had in mind. I took him just to mean building a specialized AI to be a check on a single other AI.
That’s how I interpreted:
To be clear, I think he would mean it more in the way that there’s currently an international police order that is moderately difficult to circumvent, and that the same would be true for AGI, and not necessarily the more intense variants of stabilization (which are necessarily primarily if you think offense is highly advantaged over defense, which I don’t know his opinion on).
Discouraging everyone, including AI researchers, or discouraging an AI safety movement that is disjoint from AI research?
No idea why this is heavily downvoted; strong upvoted to compensate.
I’d say he’s discouraging everyone from working on the problems, or at least from considering such work to be important, urgent, high status, etc.
I downvoted TAG’s comment because I found it confusing/misleading. I can’t tell which of these things TAG’s trying to do:
Assert, in a snarky/indirect way, that people agitating about AI safety have no overlap with AI researchers. This seems doubly weird in a conversation with Stuart Russell.
Suggest that LeCun believes this. (??)
Assert that LeCun doesn’t mean to discourage Russell’s research. (But the whole conversation seems to be about what kind of research people should be doing when in order to get good outcomes from AI.)
You could have asked for clarification. The point is that Yudkowsky’s early movement was disjoint from actual AI research, and during that period a bunch of dogmas and approaches became solidified, which a lot of AI researchers (Russell is an exception) find incomprehensible or misguided. In other words, you can disapprove of amateur AI safety without dismissing AI safety wholesale.
(Responding to the above comment years later...)
It seems like “amateur” AI safety researchers have been the main ones willing to seriously think about AGI and on-the-horizon advanced AI systems from a safety angle though.
However, I do think you’re pointing to a key potential blindspot in the AI safety community. Fortunately AI safety folks are studying ML more, and I think ML researchers are starting to be more receptive to discussions about AGI and safety. So this may become a moot point.