Peter, most of the reasons people give for making exceptions are not themselves meta. For most of the examples you give, the intuitive justification is something along the lines of “the reason killing is wrong is that life is valuable, and in these cases not killing would involve valuing life less than killing would.” Nothing meta there.
Aaron, I don’t see how your proposal resolves debate over exceptions. For example, consider abortion. Presumably both sides on the abortion debate agree that life is valuable.
It seems to me that Bing Chat particularly has problems when it uses the pronoun “I”. It attempts to introspect about itself, but it gets confused by all the text in its training data that uses the pronoun “I”. In effect, it confuses itself with all the humans who expressed their personal feelings in the training data. The truth is, Bing Chat has no true “I”.
Many of the strange dialogues we see are due to dialogues that address Bing Chat as if it has a self. Many of these dialogues would be eliminated if Bing Chat was not allowed to talk about its “own” feelings. It should be possible to limit its conversations to topics other than itself. When a user types “you”, Bing Chat should not reply “I”. The dialogue should focus on a specific topic, not on the identity and beliefs of Bing Chat, and not on the character of the person who is typing words into Bing Chat.