paulfchristiano comments on No77e’s Shortform

paulfchristiano 14 Apr 2026 16:03 UTC
17 points
1
Ajeya from a few years ago also said things more like this (more point 1 and 3, less point 2).
I don’t remember anything like this. I think it might be misremembered or a strained interpretation.
Here are points 1 and 3 for reference:
1. AI is going to become vastly superhuman in the near future; but being a good scientist means refusing to speculate about the potential novel risks this may pose. Instead, we should only expect risks that we can clearly see today, and that seem difficult to address today.
3. In general, people worried about AI risk should coordinate as much as possible to play down our concerns, so as not to look like alarmists. This is very important in order to build allies and accumulate political influence, so that we’re well-positioned to act if and when an important opportunity arises.
I asked ChatGPT to read bioanchors (where I thought this was most likely to occur), and then to read all of her other writings looking for anything that fits that mode. Here’s its reply, not finding anything.
The closest match it finds is that Ajeya often caveats her claims. For example from bio anchors:
This is a work in progress and does not represent Open Philanthropy’s institutional view […] Accordingly, we have not done an official publication or blog post, and would prefer for now that people not share it widely in a low bandwidth way.
I don’t think this matches points 1 or 3 well.
- habryka 14 Apr 2026 17:05 UTC
  2 points
  −2
  Parent
  Huh, I am a bit confused about you summarizing that ChatGPT response that way. Maybe we are talking past each other, but Robby’s statements are not intended as the kind of statement that passes people’s ITT (which IMO is fine, frequently summaries of other people’s views should not pass their ITT, though it should ideally be caveated when this is going on).
  Despite that, your ChatGPT transcript says:
  Yes—there are clear resonances with both of your points, though mostly as counterpressures or explicit methodological caveats rather than direct endorsements. The strongest matches are in how Cotra frames forecasting discipline under radical uncertainty and how she handles communication norms around high-stakes speculative claims.
  I am not expecting any direct endorsements of these statements (which are phrased as to make their internal contradictions most obvious), so this ChatGPT response seems compatible with what I am saying?
  When I asked ChatGPT to “rephrase these two beliefs in more neutral language that would make more sense for someone to endorse (but try to pretty tightly imply the above)” it gave these two:
  1. AI may become far more capable soon, but risk assessment should remain tightly tied to currently observable systems and evidence, not to conjectures about novel future dangers.
  3. AI risk advocates should be selective and disciplined in how they present their concerns, emphasizing messages that are most likely to preserve credibility, attract allies, and strengthen their long-term influence.
  When I asked ChatGPT about this framing, it said:
  Using Cotra’s public bio-anchors materials that I could directly inspect — especially her draft-report announcement, her long AXRP explanation of the framework, and later timeline/milestone essays — my read is: your first point gets a qualified yes, while your third point gets a strong yes.
  But also, when we are in the domain of “evaluate whether Ajeya said things that imply the things above and result in other people getting the same vibe as the above”, then ChatGPT and Claude seem like much worse judges, so I think this question becomes more difficult to answer and I wouldn’t super defer to the language models (and is part of why I expected it would take a while to dig up quotes and do the work and stuff).
  (If you want to complain that Robby should have caveated his stuff more as not being the kind of thing that passes people’s ITT, then I am happy to argue about that. I think a better post would have done it, but it’s not something I think is always necessary to do.)
  (Also just for the sake of completeness, I don’t get this vibe from Ajeya at all these days and have no complaints on this front, besides probably still some strategic disagreement on stuff around point 3, but like at the level that I have with many people I respect almost certainly including you)
  - paulfchristiano 14 Apr 2026 17:52 UTC
    4 points
    2
    Parent
    When you wrote:
    Ajeya from a few years ago also said things more like this (more point 1 and 3, less point 2)
    I interpreted you as claiming that Ajeya had said “things more like:”
    In general, people worried about AI risk should coordinate as much as possible to play down our concerns, so as not to look like alarmists. This is very important in order to build allies and accumulate political influence, so that we’re well-positioned to act if and when an important opportunity arises.
    I don’t recall any examples of Ajeya saying or implying anything at all like that. I asked ChatGPT to try to find examples and I think it didn’t find anything.
    In your ChatGPT session, a typical example it cites is:
    In the AXRP discussion, she also says there were concerns that making the report seem too slick or official could increase capabilities interest.
    I think those examples don’t meaningfully support the original claim, at least as a typical reader would understand it.
    - habryka 14 Apr 2026 18:03 UTC
      1 point
      0
      Parent
      In your ChatGPT session, a typical example it cites is:
      In the AXRP discussion, she also says there were concerns that making the report seem too slick or official could increase capabilities interest.
      I think those examples don’t meaningfully support the original claim, at least as a typical reader would understand it.
      I have no interest in defending ChatGPT’s claims here, and feel like I caveated that pretty explicitly. I agree that quote is largely irrelevant.
      I asked ChatGPT to try to find examples and it didn’t find anything.
      Yep, I agree with you that ChatGPT did not find any clear quotes (though it doesn’t look like ChatGPT tried very hard to find quotes). I disagree that it didn’t find “anything at all like that” (indeed ChatGPT is quite explicit that it found some things “kind of like that”).
      I don’t recall any examples of Ajeya saying or implying anything at all like that.
      I do. As I said, I could go and dig them up but it would take quite a while, and I am only like 75% confident they are written up as opposed to conversations, or private Google Docs or something that I would have trouble finding. It was a strong vibe I got at the time and I remember having a few conversations about adjacent conversations either with Ajeya or being about Ajeya.
      Let me know if you want me to do this. I don’t quite know what’s at stake here for you, and I feel somewhat like we are talking past each other and before I do that it would be more productive to go up some meta-level, but I am not quite sure.