Were you thinking of making the model good at answering questions whose correct answer depend on the model itself, like “When asked a question of the form x, what proportion of the time would you tend to answer y?”
I’m not an author of this post, so I don’t know.
I think one of the biggest dangers of this kind of self-awareness is that it allows models to know their level of accuracy in particular areas. Right now, they could be overconfident or underconfident in their abilities, which makes their plans less effective when actually implemented. If they are overconfident, their plan that relies on that ability would just fail; if they are underconfident, they are not using all of their capabilities.
I’m not an author of this post, so I don’t know.
I think one of the biggest dangers of this kind of self-awareness is that it allows models to know their level of accuracy in particular areas. Right now, they could be overconfident or underconfident in their abilities, which makes their plans less effective when actually implemented. If they are overconfident, their plan that relies on that ability would just fail; if they are underconfident, they are not using all of their capabilities.