My main takeaway of what Dario said in that talk is that Anthropic is very determined to kick off the RSI loop and willing to talk about it openly. Dario basically confirms that Claude Code is their straight shot at RSI to get to superintelligence as fast as possible (starting RSI in 2026-2027). Notably, many AI labs do not explicitly target this or at least don’t say this openly. While I think it is nice that Anthropic is doing alignment research and think that openly publishing their constitution is a good step, I think if they are successfully kicking off the RSI loop they have very low odds of succeeding.
(Epistemic status: third hand / rumor.) I heard that this is spoken about openly in Amodei’s ~weekly all-hands, and this has been the case for a ~year.
Just listened to the talk and I’m confused about what people think Dario said. The only mention I noticed of RSI at the end was when the moderator said: “Last question. What will have changed in a year.”
Dario said, paraphrasing, that the most important thing to watch this year is AIs building AIs, which could lead to wonders or a great emergency in front of us if it turns out that works and causes a meaningful speedup.
Is that what you’re talking about? I’m pretty sure I agree that this is one of the most important things to pay attention to in the next year.
Over the interview, it’s clear that Dario’s stated timelines are significantly more confidently short than Demis’s; e.g. see here https://youtu.be/02YLwsCKUww?t=1030
it’s clear that Dario’s stated reason for NOT slowing down HIS research, seemingly refering to the short-timelines research, i.e. the RSI research, is something about China.
Did you mean to ask me, rather than someone else in this thread? (I had a quick look at the new essay but haven’t read it in full yet; what I did see didn’t seem very surprising, so I don’t really feel better or worse at this stage. Happy to say more once I’ve read it properly, but I think you might have meant to ask someone else, as I wasn’t really taking a position in this thread—just pointing out a relevant section of the talk that I thought you might have missed.)
I’ve seen this message, but per our confidentiality policy, no comment. (and don’t take the below as indicative either way, glomar response etc)
I will note though that Dario did not actually talk about recursive self-improvement, nor about superintelligence; commentary on Lesswrong often assumes a shared ontology that just doesn’t exist.
I also continue to endorse the confidentiality policy, and per my red lines I still trust Anthropic’s leadership and think the company is on net good for the world.
Can you, or have you elsewhere, elaborate on the presumed shared ontology that doesn’t exist? I think I’d expect you to have a pretty good sense of the delta, and it could be important for learning how to talk to each other more effectively.
I think getting RSI and a shot at superintelligence right just appears very difficult to me. I appreciate their constitution and found the parts i read thoughtful. But I don’t see them having found a way to reliably get the model to truly internalize it’s soul document. I also assume if they were able there would be parts that break down once you get to really critical amounts of intelligence.
By “succeeding” you mean getting Safe ASI, as opposed to getting any ASI at all, right? At least that’s how I read you, but at first I thought you meant “their RSI probably won’t lead to ASI”
My main takeaway of what Dario said in that talk is that Anthropic is very determined to kick off the RSI loop and willing to talk about it openly. Dario basically confirms that Claude Code is their straight shot at RSI to get to superintelligence as fast as possible (starting RSI in 2026-2027). Notably, many AI labs do not explicitly target this or at least don’t say this openly. While I think it is nice that Anthropic is doing alignment research and think that openly publishing their constitution is a good step, I think if they are successfully kicking off the RSI loop they have very low odds of succeeding.
@Zac Hatfield-Dodds @evhub @Dave Orr @Ethan Perez @Carson Denison @Drake Thomas @gasteigerjo @Aram Ebtekar Can you comment on this? Is that what they are planning to work on? Were you aware of this? Do you think that’s a good thing to do?
(Epistemic status: third hand / rumor.) I heard that this is spoken about openly in Amodei’s ~weekly all-hands, and this has been the case for a ~year.
Just listened to the talk and I’m confused about what people think Dario said. The only mention I noticed of RSI at the end was when the moderator said: “Last question. What will have changed in a year.”
Dario said, paraphrasing, that the most important thing to watch this year is AIs building AIs, which could lead to wonders or a great emergency in front of us if it turns out that works and causes a meaningful speedup.
Is that what you’re talking about? I’m pretty sure I agree that this is one of the most important things to pay attention to in the next year.
Would you please listen for 2 minutes to the beginning of the interview around the 1 minute mark, here:
https://www.youtube.com/watch?v=02YLwsCKUww&t=59s
Over the interview, it’s clear that Dario’s stated timelines are significantly more confidently short than Demis’s; e.g. see here https://youtu.be/02YLwsCKUww?t=1030
Then, if you listen here for a couple minutes:
https://youtu.be/02YLwsCKUww?t=1326
it’s clear that Dario’s stated reason for NOT slowing down HIS research, seemingly refering to the short-timelines research, i.e. the RSI research, is something about China.
There’s a relevant section near the beginning, when the moderator asks him about his timeline: https://youtu.be/9Zz2KrBDXUo?si=9kHfNC_ec-2yTk8e&t=61
Does Dario’s new essay make you feel better, or worse?
Did you mean to ask me, rather than someone else in this thread? (I had a quick look at the new essay but haven’t read it in full yet; what I did see didn’t seem very surprising, so I don’t really feel better or worse at this stage. Happy to say more once I’ve read it properly, but I think you might have meant to ask someone else, as I wasn’t really taking a position in this thread—just pointing out a relevant section of the talk that I thought you might have missed.)
I’ve seen this message, but per our confidentiality policy, no comment. (and don’t take the below as indicative either way, glomar response etc)
I will note though that Dario did not actually talk about recursive self-improvement, nor about superintelligence; commentary on Lesswrong often assumes a shared ontology that just doesn’t exist.
I also continue to endorse the confidentiality policy, and per my red lines I still trust Anthropic’s leadership and think the company is on net good for the world.
Can you, or have you elsewhere, elaborate on the presumed shared ontology that doesn’t exist? I think I’d expect you to have a pretty good sense of the delta, and it could be important for learning how to talk to each other more effectively.
Why you think Anthropic has low odds of succeeding?
I think getting RSI and a shot at superintelligence right just appears very difficult to me. I appreciate their constitution and found the parts i read thoughtful. But I don’t see them having found a way to reliably get the model to truly internalize it’s soul document. I also assume if they were able there would be parts that break down once you get to really critical amounts of intelligence.
By “succeeding” you mean getting Safe ASI, as opposed to getting any ASI at all, right? At least that’s how I read you, but at first I thought you meant “their RSI probably won’t lead to ASI”
Their RSI very likely won’t lead to safe ASI. That’s what I meant, hope that clears it up. Whether it leads to ASI is a separate question.