I’m optimistic that the same forces that remind the collective to focus on accomplishing their instrumental goals instead of degenerating into unproductive navel-gazing will also be strong enough to remind them of their deontological commitments.
OK I actually think this might be the real disagreement, as opposed to my other comment. I think that generalizing across capabilities is much more likely than generalizing across alignment, or at least that the first thing which generalizes across strong capabilities will not generalize alignment “correctly”.
This is like a super high-level argument, but I think there are multiple ways of generalizing human values and no correct/canonical one (as in my other comment) nor are there any natural ways for an AI to be corrected without direct intervention from us. Whereas if an AI makes a factually wrong inference, it can correct itself.
OK I actually think this might be the real disagreement, as opposed to my other comment. I think that generalizing across capabilities is much more likely than generalizing across alignment, or at least that the first thing which generalizes across strong capabilities will not generalize alignment “correctly”.
This is like a super high-level argument, but I think there are multiple ways of generalizing human values and no correct/canonical one (as in my other comment) nor are there any natural ways for an AI to be corrected without direct intervention from us. Whereas if an AI makes a factually wrong inference, it can correct itself.