As you say yourself, almost no one in your list works at cG, or is in any meaningful position of authority at cG, so this feels like a bit of an absurd interpretation
A lot of Rob’s complaints are about things that happened in the past, so I don’t think it’s crazy to interpret him as talking about people who worked at CG in the past.
I think the things Rob is saying still have some strawman-y nature to them, but I think they are reasonably accurate descriptors of Anthropic leadership, plus my best guesses of what Alexander (head of Coefficient Giving) and Zach (head of CEA) believe, which seems well-described by “Dario and a cluster of Open-Phil-ish people”, and furthermore also of course constitutes an enormous fraction of the authority over broader EA.
I think that these people believe different things, and I don’t think Rob’s post particularly accurately describes any of them. For example, the Anthropic leadership doesn’t really think of themselves as trying to coordinate with AI safety people or trying to suppress them. I don’t think Alexander thinks “AI is going to become vastly superhuman in the near future” (and fwiw I don’t think Dario thinks that either, he doesn’t seem to believe in returns to intelligence substantially above human-level).
A lot of Rob’s complaints are about things that happened in the past, so I don’t think it’s crazy to interpret him as talking about people who worked at CG in the past.
Fair enough. I think that the people you list also used to believe things closer to what Rob is saying in the past, so at least we need to do a consistent comparison. Holden from 10 years ago seems to say a lot of the things that Rob is saying here, and Ajeya from a few years ago also said things more like this (more point 1 and 3, less point 2).
My guess is that it is worth digging up quotes here, but it’s a lot of work, so I am not going to do it for now, but if it turns out to be cruxy, I can.
(Again, I don’t think these are centrally the people Rob is talking about in either case. I think centrally he is talking about Anthropic, and then secondarily talking about how Open Phil people have related to Anthropic over the years, but I do still think his criticism is correct directionally for those people)
I don’t think Alexander thinks “AI is going to become vastly superhuman in the near future” (and fwiw I don’t think Dario thinks that either, he doesn’t seem to believe in returns to intelligence substantially above human-level).
I think Alexander abstractly believes that AI could very well become vastly superhuman in the near future, but yes, similar to Dario does not believe that speculating about such a thing in a non-scientific non-empirical way is appropriate, and as such they do not have coherent beliefs about this. Indeed, it seems like really a quite central match to what Rob is saying.
Ajeya from a few years ago also said things more like this (more point 1 and 3, less point 2).
I don’t remember anything like this. I think it might be misremembered or a strained interpretation.
Here are points 1 and 3 for reference:
1. AI is going to become vastly superhuman in the near future; but being a good scientist means refusing to speculate about the potential novel risks this may pose. Instead, we should only expect risks that we can clearly see today, and that seem difficult to address today.
3. In general, people worried about AI risk should coordinate as much as possible to play down our concerns, so as not to look like alarmists. This is very important in order to build allies and accumulate political influence, so that we’re well-positioned to act if and when an important opportunity arises.
I asked ChatGPT to read bioanchors (where I thought this was most likely to occur), and then to read all of her other writings looking for anything that fits that mode. Here’s its reply, not finding anything.
The closest match it finds is that Ajeya often caveats her claims. For example from bio anchors:
This is a work in progress and does not represent Open Philanthropy’s institutional view […] Accordingly, we have not done an official publication or blog post, and would prefer for now that people not share it widely in a low bandwidth way.
Huh, I am a bit confused about you summarizing that ChatGPT response that way. Maybe we are talking past each other, but Robby’s statements are not intended as the kind of statement that passes people’s ITT (which IMO is fine, frequently summaries of other people’s views should not pass their ITT, though it should ideally be caveated when this is going on).
Despite that, your ChatGPT transcript says:
Yes—there are clear resonances with both of your points, though mostly as counterpressures or explicit methodological caveats rather than direct endorsements. The strongest matches are in how Cotra frames forecasting discipline under radical uncertainty and how she handles communication norms around high-stakes speculative claims.
I am not expecting any direct endorsements of these statements (which are phrased as to make their internal contradictions most obvious), so this ChatGPT response seems compatible with what I am saying?
When I asked ChatGPT to “rephrase these two beliefs in more neutral language that would make more sense for someone to endorse (but try to pretty tightly imply the above)” it gave these two:
1. AI may become far more capable soon, but risk assessment should remain tightly tied to currently observable systems and evidence, not to conjectures about novel future dangers.
3. AI risk advocates should be selective and disciplined in how they present their concerns, emphasizing messages that are most likely to preserve credibility, attract allies, and strengthen their long-term influence.
Using Cotra’s public bio-anchors materials that I could directly inspect — especially her draft-report announcement, her long AXRP explanation of the framework, and later timeline/milestone essays — my read is: your first point gets a qualified yes, while your third point gets a strong yes.
But also, when we are in the domain of “evaluate whether Ajeya said things that imply the things above and result in other people getting the same vibe as the above”, then ChatGPT and Claude seem like much worse judges, so I think this question becomes more difficult to answer and I wouldn’t super defer to the language models (and is part of why I expected it would take a while to dig up quotes and do the work and stuff).
(If you want to complain that Robby should have caveated his stuff more as not being the kind of thing that passes people’s ITT, then I am happy to argue about that. I think a better post would have done it, but it’s not something I think is always necessary to do.)
(Also just for the sake of completeness, I don’t get this vibe from Ajeya at all these days and have no complaints on this front, besides probably still some strategic disagreement on stuff around point 3, but like at the level that I have with many people I respect almost certainly including you)
Ajeya from a few years ago also said things more like this (more point 1 and 3, less point 2)
I interpreted you as claiming that Ajeya had said “things more like:”
In general, people worried about AI risk should coordinate as much as possible to play down our concerns, so as not to look like alarmists. This is very important in order to build allies and accumulate political influence, so that we’re well-positioned to act if and when an important opportunity arises.
I don’t recall any examples of Ajeya saying or implying anything at all like that. I asked ChatGPT to try to find examples and I think it didn’t find anything.
In your ChatGPT session, a typical example it cites is:
In the AXRP discussion, she also says there were concerns that making the report seem too slick or official could increase capabilities interest.
I think those examples don’t meaningfully support the original claim, at least as a typical reader would understand it.
In your ChatGPT session, a typical example it cites is:
In the AXRP discussion, she also says there were concerns that making the report seem too slick or official could increase capabilities interest.
I think those examples don’t meaningfully support the original claim, at least as a typical reader would understand it.
I have no interest in defending ChatGPT’s claims here, and feel like I caveated that pretty explicitly. I agree that quote is largely irrelevant.
I asked ChatGPT to try to find examples and it didn’t find anything.
Yep, I agree with you that ChatGPT did not find any clear quotes (though it doesn’t look like ChatGPT tried very hard to find quotes). I disagree that it didn’t find “anything at all like that” (indeed ChatGPT is quite explicit that it found some things “kind of like that”).
I don’t recall any examples of Ajeya saying or implying anything at all like that.
I do. As I said, I could go and dig them up but it would take quite a while, and I am only like 75% confident they are written up as opposed to conversations, or private Google Docs or something that I would have trouble finding. It was a strong vibe I got at the time and I remember having a few conversations about adjacent conversations either with Ajeya or being about Ajeya.
Let me know if you want me to do this. I don’t quite know what’s at stake here for you, and I feel somewhat like we are talking past each other and before I do that it would be more productive to go up some meta-level, but I am not quite sure.
I think you’re right, and also it seems misleading / like a bad clustering to lump “the EAs” in with “Anthropic’s leadership”. I think those groups have some memetic connections, but they’re not the same group!
I feel like it’s more of a reasonable carving to lump in OpenPhil with “the EAs”, since they were/are effectively EA thought-leaders and they exerted a lot of influence, directly and indirectly.)
I think you’re right, and also it seems misleading / like a bad clustering to lump “the EAs” in with “Anthropic’s leadership”. I think those groups have some memetic connections, but they’re not the same group!
More than 50% of the talent-weighted safety people in EA are literally employees of Anthropic! The ex-CEO of Open Phil now works at Anthropic, and is married to one of its founders. These groups have enormous overlap.
Like, there is so enormous overlap, and the overlap results in such an enormous amount of de-facto deference (being an employee of a company is approximately the strongest common deference relationship we have) that it makes sense to think of these as closely intertwined.
Yes, there are people who attach the EA label themselves who are different here, sometimes even quite substantial clusters. But it’s also IMO clear from Scott’s response that he himself is also majorly deferring and is majorly supportive of Anthropic as a representative of EA, so this clearly isn’t just a split between “everyone who works at Anthropic and everyone who doesn’t”.
Rob used “Open Phil” exactly two times. One time saying “a cluster of Dario and Open-Phil-ish people” and another time “EAs / Open Phil” in reference to the broader community that includes all of these things. These seem like totally reasonable ways of using these pointers and words. I don’t have anything better. It’s definitely not “just Anthropic” as I think Scott very unambiguously demonstrates, and it would be of course extremely confusing to refer to Scott as “Anthropic”.
Imagine re Open Phil and hardcore rationalists “the ex-CEO of MIRI now works at Open Phil, and and the CEO of Lightcone is dating an Open Phil employee. These groups have enormous overlap.”
Yes. People can have a lot of social overlap, yet have very different views from one another, especially in the broader Bay Area intellectual ecosystem. My sense is that Anthropic leadership has very different views from most AI safety EAs.
More than 50% of the talent-weighted safety people in EA are literally employees of Anthropic!
Why do you think this? I’m skeptical this is true, especially if you’re including non-technical talent.
Why do you think this? I’m skeptical this is true, especially if you’re including non-technical talent.
IDK, I counted them? I made some spreadsheets over the years, and ran this number by a bunch of other people, and my current guess is that it’s around 55%? When I list organizations with full-time employees working in safety I actually end up at substantially above 50% of people working at Anthropic, but I think that’s overcounting.
My sense is that Anthropic leadership has very different views from most AI safety EAs.
I think there are differences and overlaps. I think Rob points to a thing that is shared across a cluster that spans both of them, and has historically had a lot of influence.
A lot of Rob’s complaints are about things that happened in the past, so I don’t think it’s crazy to interpret him as talking about people who worked at CG in the past.
I think that these people believe different things, and I don’t think Rob’s post particularly accurately describes any of them. For example, the Anthropic leadership doesn’t really think of themselves as trying to coordinate with AI safety people or trying to suppress them. I don’t think Alexander thinks “AI is going to become vastly superhuman in the near future” (and fwiw I don’t think Dario thinks that either, he doesn’t seem to believe in returns to intelligence substantially above human-level).
(sending quickly, I might be wrong)
Fair enough. I think that the people you list also used to believe things closer to what Rob is saying in the past, so at least we need to do a consistent comparison. Holden from 10 years ago seems to say a lot of the things that Rob is saying here, and Ajeya from a few years ago also said things more like this (more point 1 and 3, less point 2).
My guess is that it is worth digging up quotes here, but it’s a lot of work, so I am not going to do it for now, but if it turns out to be cruxy, I can.
(Again, I don’t think these are centrally the people Rob is talking about in either case. I think centrally he is talking about Anthropic, and then secondarily talking about how Open Phil people have related to Anthropic over the years, but I do still think his criticism is correct directionally for those people)
I think Alexander abstractly believes that AI could very well become vastly superhuman in the near future, but yes, similar to Dario does not believe that speculating about such a thing in a non-scientific non-empirical way is appropriate, and as such they do not have coherent beliefs about this. Indeed, it seems like really a quite central match to what Rob is saying.
I don’t remember anything like this. I think it might be misremembered or a strained interpretation.
Here are points 1 and 3 for reference:
I asked ChatGPT to read bioanchors (where I thought this was most likely to occur), and then to read all of her other writings looking for anything that fits that mode. Here’s its reply, not finding anything.
The closest match it finds is that Ajeya often caveats her claims. For example from bio anchors:
I don’t think this matches points 1 or 3 well.
Huh, I am a bit confused about you summarizing that ChatGPT response that way. Maybe we are talking past each other, but Robby’s statements are not intended as the kind of statement that passes people’s ITT (which IMO is fine, frequently summaries of other people’s views should not pass their ITT, though it should ideally be caveated when this is going on).
Despite that, your ChatGPT transcript says:
I am not expecting any direct endorsements of these statements (which are phrased as to make their internal contradictions most obvious), so this ChatGPT response seems compatible with what I am saying?
When I asked ChatGPT to “rephrase these two beliefs in more neutral language that would make more sense for someone to endorse (but try to pretty tightly imply the above)” it gave these two:
When I asked ChatGPT about this framing, it said:
But also, when we are in the domain of “evaluate whether Ajeya said things that imply the things above and result in other people getting the same vibe as the above”, then ChatGPT and Claude seem like much worse judges, so I think this question becomes more difficult to answer and I wouldn’t super defer to the language models (and is part of why I expected it would take a while to dig up quotes and do the work and stuff).
(If you want to complain that Robby should have caveated his stuff more as not being the kind of thing that passes people’s ITT, then I am happy to argue about that. I think a better post would have done it, but it’s not something I think is always necessary to do.)
(Also just for the sake of completeness, I don’t get this vibe from Ajeya at all these days and have no complaints on this front, besides probably still some strategic disagreement on stuff around point 3, but like at the level that I have with many people I respect almost certainly including you)
When you wrote:
I interpreted you as claiming that Ajeya had said “things more like:”
I don’t recall any examples of Ajeya saying or implying anything at all like that. I asked ChatGPT to try to find examples and I think it didn’t find anything.
In your ChatGPT session, a typical example it cites is:
I think those examples don’t meaningfully support the original claim, at least as a typical reader would understand it.
I have no interest in defending ChatGPT’s claims here, and feel like I caveated that pretty explicitly. I agree that quote is largely irrelevant.
Yep, I agree with you that ChatGPT did not find any clear quotes (though it doesn’t look like ChatGPT tried very hard to find quotes). I disagree that it didn’t find “anything at all like that” (indeed ChatGPT is quite explicit that it found some things “kind of like that”).
I do. As I said, I could go and dig them up but it would take quite a while, and I am only like 75% confident they are written up as opposed to conversations, or private Google Docs or something that I would have trouble finding. It was a strong vibe I got at the time and I remember having a few conversations about adjacent conversations either with Ajeya or being about Ajeya.
Let me know if you want me to do this. I don’t quite know what’s at stake here for you, and I feel somewhat like we are talking past each other and before I do that it would be more productive to go up some meta-level, but I am not quite sure.
I think you’re right, and also it seems misleading / like a bad clustering to lump “the EAs” in with “Anthropic’s leadership”. I think those groups have some memetic connections, but they’re not the same group!
I feel like it’s more of a reasonable carving to lump in OpenPhil with “the EAs”, since they were/are effectively EA thought-leaders and they exerted a lot of influence, directly and indirectly.)
More than 50% of the talent-weighted safety people in EA are literally employees of Anthropic! The ex-CEO of Open Phil now works at Anthropic, and is married to one of its founders. These groups have enormous overlap.
Like, there is so enormous overlap, and the overlap results in such an enormous amount of de-facto deference (being an employee of a company is approximately the strongest common deference relationship we have) that it makes sense to think of these as closely intertwined.
Yes, there are people who attach the EA label themselves who are different here, sometimes even quite substantial clusters. But it’s also IMO clear from Scott’s response that he himself is also majorly deferring and is majorly supportive of Anthropic as a representative of EA, so this clearly isn’t just a split between “everyone who works at Anthropic and everyone who doesn’t”.
Rob used “Open Phil” exactly two times. One time saying “a cluster of Dario and Open-Phil-ish people” and another time “EAs / Open Phil” in reference to the broader community that includes all of these things. These seem like totally reasonable ways of using these pointers and words. I don’t have anything better. It’s definitely not “just Anthropic” as I think Scott very unambiguously demonstrates, and it would be of course extremely confusing to refer to Scott as “Anthropic”.
Imagine re Open Phil and hardcore rationalists “the ex-CEO of MIRI now works at Open Phil, and and the CEO of Lightcone is dating an Open Phil employee. These groups have enormous overlap.”
Yes. People can have a lot of social overlap, yet have very different views from one another, especially in the broader Bay Area intellectual ecosystem. My sense is that Anthropic leadership has very different views from most AI safety EAs.
Why do you think this? I’m skeptical this is true, especially if you’re including non-technical talent.
IDK, I counted them? I made some spreadsheets over the years, and ran this number by a bunch of other people, and my current guess is that it’s around 55%? When I list organizations with full-time employees working in safety I actually end up at substantially above 50% of people working at Anthropic, but I think that’s overcounting.
I think there are differences and overlaps. I think Rob points to a thing that is shared across a cluster that spans both of them, and has historically had a lot of influence.