Rohin Shah comments on rohinmshah’s Shortform

Rohin Shah 19 Feb 2026 13:18 UTC
1 point
0
Hum, I usually expect that large complex important projects should have a roadmap, some sketch of the future that goes well with details to fill in. The more detailed it is, the more we check it for consistency and likelihood to work. Does this match you general experience with planning projects trying to achieve a goal?
No.
It does match my general experience with moderate tactical projects (say, projects that involve up to about 10 person-years of research effort). But not for large complex important projects.
(And e.g. this is very much not the standard advice for startups, which also have the problem of doing something novel.)
What you say there looks like an extremely vague and high level roadmap that sounds to me like ‘we’ll figure out our add we go as data comes in’, plus automated alignment.
Well yes, it’s an aside in a LessWrong comment that I dashed off in a few minutes.
I would be really enthusiastic for you and your team to try unblurring that roadmap, and seeing what difficulties you find at superintelligence level on the current path.
There is also a 100+ page paper that I linked in the original post, that goes into a fair amount of detail on what the various risks and mitigations might look like. In my experience, nobody outside of GDM really seems to care about its consistency or likelihood to work (except inasmuch as people dismiss it without reading it because of a prior that anything proposed currently will not work).
- plex 19 Feb 2026 13:40 UTC
  1 point
  −8
  Parent
  No.
  Okay, that is a position which there might be good arguments for, but that seems important to say loudly and clearly, both inside GDM and outside, that you do not have a plan or roadmap for superintelligence misalignment (even if you don’t think you should have one). If nothing else, this is the kind of thing your leadership should be made aware of explicitly, so they can either adjust that or use it in their own public communications to try and reduce race dynamics.
  It does match my general experience with moderate tactical projects (say, projects that involve up to about 10 person-years of research effort). But not for large complex important projects.
  Okay, would you like to bet on whether some of the largest research programs had plans going into them? I haven’t checked, but I would put at least 10:1 odds that if we pick say 3 projects like Apollo Program, Manhattan Project, and others on a similar scale and type they will all have had a high level roadmap of things to try which could plausibly address the core challenges quite early on^[1], even if a lot of details ended up changing when they ran into reality.
  There is also a 100+ page paper that I linked in the original post, that goes into a fair amount of detail on what the various risks and mitigations might look like.
  When I ask a plain no special prompting history off AI to summarize, it says:
  (detailed analysis of non-superintelligence focused bits)
  Is there a different document which does focus on either different approaches which are aimed at superintelligence, or analyzing whether these approaches are actually fit for that challenge? Or is this summary incorrect, in a way it would be much easier for you to point out and quote the relevant sections, as an author of the paper, than me as someone who would have to read it from scratch and also currently does not expect to find things which explicitly address the most difficult bottleneck in those 100 pages.
  (I am genuinely glad you’re engaging, but I am not reassured so far, and encourage you to look at the stack of how you’re evaluating this specific concern I’m raising and see if you’re running a truth-seeking process which would, if I had a fair point, be able to notice)
  1. ^
    Let’s say a collection of core technical problems to be solved, and a set of plausible solutions to try (perhaps all of which were discarded, but were a starting point for exploration).
  - Rohin Shah 19 Feb 2026 16:02 UTC
    3 points
    −1
    Parent
    Okay, would you like to bet on whether some of the largest research programs had plans going into them? I haven’t checked, but I would put at least 10:1 odds that if we pick say 3 projects like Apollo Program, Manhattan Project, and others on a similar scale and type they will all have had a high level roadmap of things to try which could plausibly address the core challenges quite early on^[1], even if a lot of details ended up changing when they ran into reality.
    By this standard there is totally a plan / roadmap which is elaborated in that paper.
    But also this notion of a plan / roadmap has approximately no relation to the way “plan” is used in AI safety discourse in my experience.
    
    EDIT: There’s a 10 page executive summary you could read. Or you could read Section 6 on misalignment. Within that probably Amplified Oversight is the most relevant section. But I also don’t expect that this will change your mind ~at all because it isn’t really written with you as the intended audience. The AI summary is sometimes wrong/mistaken, sometimes correct but missing the point, and occasionally correct in a non-misleading way.