Seth, I forget where you fall in the intent alignment typology: if we build a superintelligent AI that follows instructions in the way you imagine, can we just give it the instruction “Take autonomous action to do the right thing,” and then it will just go do good stuff without us needing to continue interacting with it in the instruction-following paradigm?
I am definitely thinking of IF as it applies to systems with capability for unlimited autonomy. Intent alignment as a concept doesn’t end at some level of capability—although I think we often assume it would.
How it would understand “the right thing” is the question. But yes, intent alignment as I’m thinking of it does scale smoothly into value alignment plus corrigibility if you can get it right enough.
Seth, I forget where you fall in the intent alignment typology: if we build a superintelligent AI that follows instructions in the way you imagine, can we just give it the instruction “Take autonomous action to do the right thing,” and then it will just go do good stuff without us needing to continue interacting with it in the instruction-following paradigm?
I am definitely thinking of IF as it applies to systems with capability for unlimited autonomy. Intent alignment as a concept doesn’t end at some level of capability—although I think we often assume it would.
How it would understand “the right thing” is the question. But yes, intent alignment as I’m thinking of it does scale smoothly into value alignment plus corrigibility if you can get it right enough.