Charlie Steiner comments on Problems with instruction-following as an alignment target

Charlie Steiner 18 May 2025 16:21 UTC
LW: 6 AF: 4
0
AF
Seth, I forget where you fall in the intent alignment typology: if we build a superintelligent AI that follows instructions in the way you imagine, can we just give it the instruction “Take autonomous action to do the right thing,” and then it will just go do good stuff without us needing to continue interacting with it in the instruction-following paradigm?
- Seth Herd 18 May 2025 20:27 UTC
  4 points
  2
  Parent
  I am definitely thinking of IF as it applies to systems with capability for unlimited autonomy. Intent alignment as a concept doesn’t end at some level of capability—although I think we often assume it would.
  
  How it would understand “the right thing” is the question. But yes, intent alignment as I’m thinking of it does scale smoothly into value alignment plus corrigibility if you can get it right enough.