Raemon comments on Jemist’s Shortform

Raemon 9 Aug 2025 2:40 UTC
4 points
2
I think there’s one structure-of-plan that is sort of like your outline (I think it is similar to John Wentworth’s plan but sort of skipping ahead past some steps and being more-specific-about-the-final-solution which means more wrong)
(I don’t think John self-identifies as particularly oriented around your “4 steps from AI control to automate alignment research”. I haven’t heard the people who say ‘let’s automate alignment research’ say anything that sounded very coherent. I think many people are thinking something like “what if we had a LOT of interpretability?” but IMO not really thinking through the next steps needed for that interpretability to be useful in the endgame.)
STEM AI → Pivotal Act
I haven’t heard anyone talk about this for awhile, but a few years back I heard a cluster of plans that were something like “build STEM AI with very narrow ability to think, which you could be confident couldn’t model humans at all, which would only think about resources inside a 10′ by 10′ cube, and then use that to invent the pre-requisites for uploading or biological intelligence enhancement, and then ??? → very smart humans running at fast speeds figure out how to invent a pivotal technology.”
I don’t think the LLM-centric era lends itself well to this plan. But, I could see a route where you get a less-robust-and-thus-necessarily-weaker STEM AI trained on a careful STEM corpus with careful control and asking it carefully scoped questions, which could maybe be more powerful than you could get away with for more generically competent LLMs.
- J Bostock 9 Aug 2025 10:17 UTC
  2 points
  0
  Parent
  Yes, a human-uploading or human-enhancing pivotal act might actually be something people are thinking about. Yudkowsky gives his nanotech-GPU-melting pivotal act example, which—while he has stipulated that it’s not his real plan—still anchors “pivotal act” on “build the most advanced weapon system of all time and carry out a first-strike”. This is not something that governments (and especially companies) can or should really talk about as a plan, since threatening a first-strike on your geopolitical opponents does not a cooperative atmosphere make.
  (though I suppose a series of targeted, conventional strikes on data centers and chip factories across the world might be on the pareto-frontier of “good” vs “likely” outcomes)