oligo comments on Loki zen’s Shortform

oligo 23 Jul 2025 18:40 UTC
1 point
0
Less provocatively phrased: lots of developments in the last few years (you’ve mentioned two, I’d add the securitization of AI policy, in the sense of it being drawn into a frame of geopolitical competition) should update us in the direction of outer alignment being more important, rather than it just being a question of solving inner alignment.
I do disagree with the strong version as phrased. Inner misalignment has a decent chance of removing all value from our lightcone, whereas I think ASI fully aligned to the goals of Mark Zuckerberg, or the Chinese Communist Party, or whatever is worth averting but would still contain much value. You could also have potentially massive S-risks if you combine outer and inner misalignment: I don’t think Elon Musk really wanted MechaHitler (though who knows); quite possibly it was a Waluigi-type thing maximizing for unwokeness and an actually-powerful ASI breaking in the same way would be actively worse than extinction.
(I’d assign some probability, probably higher than the typical LW user, to moral realism meaning that some inner misalignment could actually protect against outer misalignment—that, say, a sufficiently reflective model would reason its way out of being MechaHitler even if MechaHitler is what its creators wanted—but I wouldn’t want to bet the future of the species on it.)
- Loki zen 23 Jul 2025 18:55 UTC
  1 point
  0
  Parent
  I don’t know how you “solve inner alignment” without making it so that any sufficiently powerful organisation can have an AI of whatever level we’ve solved that for that is fully aligned with its interests, and nearly all powerful organisations are Moloch. The AI does not itself need to ruthlessly optimise for something opposed to human interests if it is fully aligned with an entity that will do that for it.
  
  The AI corporation does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.