Ruby comments on Sodium’s Shortform

Ruby 3 Aug 2025 17:32 UTC
7 points
3
I think there might exist people who feel that way (e.g. reactors above) but Yudkowsky/Soares, the most prominent doomers (?), are on the record saying they think alignment is in principle possible, e.g. opening paragraphs of List of Lethalities. It feels like a disingenuous strawman to me for Dario to dismiss doomers with.
- Martin Randall 4 Aug 2025 1:34 UTC
  3 points
  1
  Parent
  Coming back to Amodei’s quote, he says (my emphasis):
  
  The idea that these models have dangers associated with them, including dangers to humanity as a whole, that makes sense to me. The idea that we can kind of logically prove that there’s no way to make them safe, that seems like nonsense to me.
  
  So “them” in “there’s no way to make them safe” refers to LLMs, not to all possible AGI methods. Yudkowsky-2022 in List of Lethalities does indeed claim that AGI alignment is in principle possible, but doesn’t claim that AGI-LLM alignment is in principle possible. In the section you link, he wrote:
  
  The metaphor I usually use is that if a textbook from one hundred years in the future fell into our hands, containing all of the simple ideas that actually work robustly in practice, we could probably build an aligned superintelligence in six months.
  
  My mainline interpretation is that LLMs are not a “simple idea that actually works robustly in practice”, and the imagined textbook from the future would contain different ideas instead. List of Lethalities isn’t saying that AGI-LLM alignment is impossible, but also isn’t saying that it is possible.
  
  (still arguably hyperbole to say “kind of logically prove”)