Thanks for writing this piece; I think your argument is an interesting one.
One observation I’ve made is that MIRI, despite its first-mover advantage in AI safety, no longer leads the conversation in a substantial way. I do attribute this somewhat to their lack of significant publications in the AI field since the mid-2010s, and their diminished reputation within the field itself. I feel like this serves as one data point that supports your claim.
I feel like you’ve done a good job laying out potential failure modes of the current strategy, but it’s not a slam dunk (not that I think it was your intention to write a slam dunk as much as it was to inject additional nuance to the debate). So I want to ask, have you put any thought into what a more effective strategy for maximizing work on AI safety might be?
Alignment work will look a lot like regular AI work; it is unlikely that someone trying to theorize about how to solve alignment, separate from any particular AI system that they are trying to align, will see success.
Takeoff speed is more important than timelines. The ideal scenario is one where compute is the bottleneck and we figure out how to build AGI well before we have enough compute to build it, because this allows us to experiment with subhuman AGI systems.
Slow takeoff is pretty likely. I think we’ll need a lot more compute than we have before we can train human-level AGI
I don’t think alignment will be that hard. A LLM trained to be agentic can easily be trained to be corrigible.
Because I don’t think alignment will be that hard, I think a lot of AI risk involves ASI being built without proper precautions. I think if teams are taking the proper precautions it will probably be fine. This is one more reason why I think individual withdrawal is such a terrible idea.
As an aside, my subjective impression is that Yann LeCun is correct when he says that worry about AI risk is a minority position among AI researchers. I think a lot of pushes for individual withdrawal implicitly assume that most AI researchers are worried about AI risk.
With that said, I’m not super confident in anything I said above. My lack of confidence goes both ways; maybe AI alignment isn’t a real problem and Yann LeCun is right, or maybe AI alignment is a much more difficult problem than I think and theoretical work independent from day-to-day AI research is necessary. That’s why on a meta level I think people should pursue different approaches that seem promising to them and contribute how they think they personally are suited to contribute. For some people that will be public advocacy, for some that will be theorizing about alignment, for some that will be working at AI labs. But working in AI capabilities absolutely seems strongly +EV to me, and so do all of the major AI labs (DM + OAI + Anthropic). Even giving a small likelihood to the contrary view (timelines are very fast, alignment is hard and unsolved) doesn’t change that—if anything, if I thought ASI was imminent I would be even more glad that the leading labs were all concerned about AI risk.
Do you understand the nature of consciousness? Do you know the nature of right and wrong? Do you know how an AI would be able to figure out these things? Do you think a superintelligent AI can be “aligned” without knowing these things?
I suspect that MIRI was prioritising alignment research over the communication of that research when they were optimistic about their alignment directions panning out. It feels like that was a reasonable bet to make, even though I do wish they’d communicated their perspective earlier (happily, they’ve been publishing a lot more recently).
Thanks for writing this piece; I think your argument is an interesting one.
One observation I’ve made is that MIRI, despite its first-mover advantage in AI safety, no longer leads the conversation in a substantial way. I do attribute this somewhat to their lack of significant publications in the AI field since the mid-2010s, and their diminished reputation within the field itself. I feel like this serves as one data point that supports your claim.
I feel like you’ve done a good job laying out potential failure modes of the current strategy, but it’s not a slam dunk (not that I think it was your intention to write a slam dunk as much as it was to inject additional nuance to the debate). So I want to ask, have you put any thought into what a more effective strategy for maximizing work on AI safety might be?
My specific view:
OpenAI’s approach seems most promising to me
Alignment work will look a lot like regular AI work; it is unlikely that someone trying to theorize about how to solve alignment, separate from any particular AI system that they are trying to align, will see success.
Takeoff speed is more important than timelines. The ideal scenario is one where compute is the bottleneck and we figure out how to build AGI well before we have enough compute to build it, because this allows us to experiment with subhuman AGI systems.
Slow takeoff is pretty likely. I think we’ll need a lot more compute than we have before we can train human-level AGI
I don’t think alignment will be that hard. A LLM trained to be agentic can easily be trained to be corrigible.
Because I don’t think alignment will be that hard, I think a lot of AI risk involves ASI being built without proper precautions. I think if teams are taking the proper precautions it will probably be fine. This is one more reason why I think individual withdrawal is such a terrible idea.
As an aside, my subjective impression is that Yann LeCun is correct when he says that worry about AI risk is a minority position among AI researchers. I think a lot of pushes for individual withdrawal implicitly assume that most AI researchers are worried about AI risk.
With that said, I’m not super confident in anything I said above. My lack of confidence goes both ways; maybe AI alignment isn’t a real problem and Yann LeCun is right, or maybe AI alignment is a much more difficult problem than I think and theoretical work independent from day-to-day AI research is necessary. That’s why on a meta level I think people should pursue different approaches that seem promising to them and contribute how they think they personally are suited to contribute. For some people that will be public advocacy, for some that will be theorizing about alignment, for some that will be working at AI labs. But working in AI capabilities absolutely seems strongly +EV to me, and so do all of the major AI labs (DM + OAI + Anthropic). Even giving a small likelihood to the contrary view (timelines are very fast, alignment is hard and unsolved) doesn’t change that—if anything, if I thought ASI was imminent I would be even more glad that the leading labs were all concerned about AI risk.
Do you understand the nature of consciousness? Do you know the nature of right and wrong? Do you know how an AI would be able to figure out these things? Do you think a superintelligent AI can be “aligned” without knowing these things?
I suspect that MIRI was prioritising alignment research over the communication of that research when they were optimistic about their alignment directions panning out. It feels like that was a reasonable bet to make, even though I do wish they’d communicated their perspective earlier (happily, they’ve been publishing a lot more recently).