Your typology of alternatives to direct research is logical. But they presuppose a less likely future. The likely timeline is human-level AI (we are here) → superintelligence (no pause) → AI controls the world.
If you can solve the big alignment problem—adequate values for an autonomous superintelligence—then those other problems will probably be solved, by the superintelligence. And as always, if superintelligence comes out badly misaligned, there’ll be nothing we can do about that or anything else. So the big alignment problem remains the most important one.
I plan on writing something longer about this in the future but people use “alignment” to refer to two different things, basically thing 1 is “ASI solves ethics and then behaves ethically” and thing 2 is “ASI does what people want it to do”. Approximately nobody is working on thing 1, only on thing 2, and thing 2 doesn’t get us a solution to non-alignment problems.
He is quite explicit in his latest interview (which was published after your comment, https://www.dwarkesh.com/p/ilya-sutskever-2) that he wants sentient AI systems caring about all sentient beings.
(I don’t know if he is competitive, though; he says he has enough compute, and that might be the case, but he is quoting 5-20 years timelines, which seems rather slow these days).
Good callout. I was glad to hear that Ilya is thinking about all sentient life and not just humans.
I didn’t interpret it to mean that he’s working on thing 1. The direct quote was
I think in particular, there’s a case to be made that it will be easier to build an AI that cares about sentient life than an AI that cares about human life alone, because the AI itself will be sentient. And if you think about things like mirror neurons and human empathy for animals, which you might argue it’s not big enough, but it exists. I think it’s an emergent property from the fact that we model others with the same circuit that we use to model ourselves, because that’s the most efficient thing to do.
Sounds to me like he expects an aligned AI to care about all sentient beings, but he isn’t necessarily working on making that happen. AFAIK Ilya’s new venture hasn’t published any alignment research yet, so we don’t know what exactly he’s working on.
In his earlier thinking (~2023) he was also quite focused on non-standard approaches to AI existential safety, and it was clear that he was expecting to collaborate with advanced AI systems on that.
That’s an indirect evidence, but it does look like he is continuing in the same mindset.
It would be nice if his org finds ways to publish those aspects of their activity which might contribute to AI existential safety[[1]].
Since almost everyone is using “alignment” for “thing 2″ these days, I am trying to avoid the word; I doubt solving “thing 2” would contribute much to existential safety, and I can easily see how that might turn counterproductive instead.
I do agree with that. I also think it might be worth diverting a rather small percentage of effort towards figuring out what we actually want from and for AI development, in the worlds where that turns out to be possible. At the very least, we can generate some better training data and give models higher-quality feedback.
Your typology of alternatives to direct research is logical. But they presuppose a less likely future. The likely timeline is human-level AI (we are here) → superintelligence (no pause) → AI controls the world.
If you can solve the big alignment problem—adequate values for an autonomous superintelligence—then those other problems will probably be solved, by the superintelligence. And as always, if superintelligence comes out badly misaligned, there’ll be nothing we can do about that or anything else. So the big alignment problem remains the most important one.
I plan on writing something longer about this in the future but people use “alignment” to refer to two different things, basically thing 1 is “ASI solves ethics and then behaves ethically” and thing 2 is “ASI does what people want it to do”. Approximately nobody is working on thing 1, only on thing 2, and thing 2 doesn’t get us a solution to non-alignment problems.
I think Ilya is working on thing 1.
He is quite explicit in his latest interview (which was published after your comment, https://www.dwarkesh.com/p/ilya-sutskever-2) that he wants sentient AI systems caring about all sentient beings.
(I don’t know if he is competitive, though; he says he has enough compute, and that might be the case, but he is quoting 5-20 years timelines, which seems rather slow these days).
Good callout. I was glad to hear that Ilya is thinking about all sentient life and not just humans.
I didn’t interpret it to mean that he’s working on thing 1. The direct quote was
Sounds to me like he expects an aligned AI to care about all sentient beings, but he isn’t necessarily working on making that happen. AFAIK Ilya’s new venture hasn’t published any alignment research yet, so we don’t know what exactly he’s working on.
In his earlier thinking (~2023) he was also quite focused on non-standard approaches to AI existential safety, and it was clear that he was expecting to collaborate with advanced AI systems on that.
That’s an indirect evidence, but it does look like he is continuing in the same mindset.
It would be nice if his org finds ways to publish those aspects of their activity which might contribute to AI existential safety [[1]] .
Since almost everyone is using “alignment” for “thing 2″ these days, I am trying to avoid the word; I doubt solving “thing 2” would contribute much to existential safety, and I can easily see how that might turn counterproductive instead.
I do agree with that. I also think it might be worth diverting a rather small percentage of effort towards figuring out what we actually want from and for AI development, in the worlds where that turns out to be possible. At the very least, we can generate some better training data and give models higher-quality feedback.