But if your definition of alignment is “an AI that does things in a way such that all humans agree on it’s ethical choices” I think you’re doomed from the start, so this counterintuition proves too much. I don’t think there is an action an AI could take or a recommendation it could make that would satisfy that criteria (in fact, many people would say that the AI by it’s nature shouldn’t be taking actions or making recommendations)
It seems like something like “An AI that acts and reasons in a way that most people who are broadly considered moral consider moral” would be a pretty good outcome.
I don’t think it’s that weak?
But if your definition of alignment is “an AI that does things in a way such that all humans agree on it’s ethical choices” I think you’re doomed from the start, so this counterintuition proves too much. I don’t think there is an action an AI could take or a recommendation it could make that would satisfy that criteria (in fact, many people would say that the AI by it’s nature shouldn’t be taking actions or making recommendations)
It seems like something like “An AI that acts and reasons in a way that most people who are broadly considered moral consider moral” would be a pretty good outcome.