it is kind of funny that caring a lot about reflective stability of alignment proposals and paradoxes arising from self modelling (e.g in action counterfactuals) is most common in the people who are the worst at modelling themselves
it is kind of funny that caring a lot about reflective stability of alignment proposals and paradoxes arising from self modelling (e.g in action counterfactuals) is most common in the people who are the worst at modelling themselves