ryan_greenblatt comments on Machines of Faithful Obedience

ryan_greenblatt 24 Jun 2025 23:29 UTC
6 points
0
I believe that AI (including AGI and ASI) can do the same and be a positive force for humanity. I also believe that it is possible to solve the “technical alignment” problem and build AIs that follow the words and intent of our instructions and report faithfully on their actions and observations.
I will not defend these two claims here. However, even granting these optimistic premises, AI’s positive impact is not guaranteed.
It’s interesting that you describe the claims “AI can be a positive force for humanity” and “technical alignment can be solved” as optimistic premises. I think people who think misalignment risks are catastrophically high (e.g. a >25% chance of literal AI takeover) would agree with these premises (they think misalignment risks could be avoided, it’s just that we’re at least somewhat likely to not succeed at this), so these claims don’t typically distinguish between typical optimists and pessimists (at least with respect to worries around misalignment risks).
Perhaps when you say “it is possible to solve” you mean “we’re very likely to solve it in practice given the realistic time and budget we will have for this problem”. In this case, there certainly is disagreement!
I’m not sure whether you were trying to highlight typical disagreements or if you introduced things in this way for some othe reason.
You might imagine that by positing that technical alignment is solvable, I “assumed away” all the potential risks with AI.
I certainly don’t think so! As you note, solvable does not imply that it will be solved!
- boazbarak 25 Jun 2025 14:32 UTC
  7 points
  2
  Parent
  Thanks! Note that I did not optimize this essay just for the LessWrong audience, so different people might have different points on agreements and disagreement.
  
  I think I am indeed more optimistic, but I would not say that we will solve it “by default.” Like I say in the essay, I don’t think that by simply letting the market work we will get sufficient level of alignment. We need to make an effort- this is similar to the example of Microsoft that I mentioned in this comment, but I hope that unlike the Microsoft example we don’t need to wait for huge failures until we do that.