Seth Herd comments on rohinmshah’s Shortform

Seth Herd 18 Feb 2026 14:02 UTC
7 points
0
That seems fair, and I appreciate the clarification. The plan isn’t to have the AI do your homework, but to hire it, alongside people, to help with that homework.
The more reasonable (and IMO common) form of the worry is that developers, including GDM, don’t seem to have a plan that extends to really-dangerous AI. So by default they’ll wind up leaning heavily on AI to “do the homework.” That would be risky, for fairly obvious reasons.^[1]

This seems more likely to work if we make differential progress in reliability. I recently laid out how Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities. I discussed how this might make them actually-helpful for assisting in the conceptual challenges of alignment.

I can see good reasons that GDM’s full alignment plan wouldn’t be made public.
1. ^
  One worry about leaning on AI assistance for conceptual alignment research is summed up in Wentworth’s The Median Doom-Path: Slop, not Scheming. Smarter-than-human but sloppy AI is likely to create convincing but faulty conceptual work. And of course scheming is still a risk if the alignment, interpretability, amplified oversight, and control measures aren’t done carefully enough. I worry that pressure for progress could become make it very hard to be careful enough, despite best intentions.