Vaniver comments on The title is reasonable

Vaniver 22 Sep 2025 0:46 UTC
26 points
1
this isn’t to say this other paradigm will be safer, just that a narrow description of “current techniques” doesn’t include the default trajectory.
Sorry, this seems wild to me. If current techniques seem lethal, and future techniques might be worse, then I’m not sure what the point is of pointing out that the future will be different.
But, if these earlier AIs were well aligned (and wise and had reasonable epistemics), I think it’s pretty unclear that the situation would go poorly and I’d guess it would go fine because these AIs would themselves develop much better alignment techniques. This is my main disagreement with the book.
I mean, I also believe that if we solve the alignment problem, then we will no longer have an alignment problem, and I predict the same is true of Nate and Eliezer.
Is your current sense that if you and Buck retired, the rest of the AI field would successfully deliver on alignment? Like, I’m trying to figure out whether your sense here is the default is “your research plan succeeds” or “the world without your research plan”.
- ryan_greenblatt 22 Sep 2025 1:22 UTC
  18 points
  3
  Parent
  
  I mean, I also believe that if we solve the alignment problem, then we will no longer have an alignment problem, and I predict the same is true of Nate and Eliezer.
  
  By “superintelligence” I mean “systems which are qualititatively much smarter than top human experts”. (If Anyone Builds It, Everyone Dies seems to define ASI in a way that could include weaker levels of capability, but I’m trying to refer to what I see as the typical usage of the term.)
  
  Sometimes, people say that “aligning superintelligence is hard because it will be much smarter than us”. I agree, this seems like this makes aligning superintelligence much harder for multiple reasons.
  
  Correspondingly, I’m noting that if we can align earlier systems which are just capable enough to obsolete human labor (which IMO seems way easier than directly aligning wildly superhuman systems), these systems might be able to ongoingly align their successors. I wouldn’t consider this “solving the alignment problem” because we instead just aligned a particular non-ASI system in a non-scalable way, in the same way I don’t consider “claude 4.0 opus is aligned enough to be pretty helpful and not plot takeover” to be a solution to the alignment problem.
  
  Perhaps your view is “obviously it’s totally sufficient to align systems which are just capable enough to obsolete current human safety labor, so that’s what I meant by ‘the alignment problem’”. I don’t personally think this is obvious given race dynamics and limited time (though I do think it’s likely to suffice in practice). Minimally, people often seem to talk about aligning ASI (which I interpret to mean wildly superhuman AIs rather than human-ish level AIs).