I think this is a specific example of language models articulating their own policies, which is an instance of introspection
Sure, yeah, I’m totally on board with that, especially in light of the ‘Tell Me About Yourself’ work (which is absolutely fascinating). I was mostly just listing it for completeness there.
I think b) doesn’t need to be true, responding in “hello” acrostics is just different from how any typical english speaker responds. Ditto for c), responding in acrostics is probably the main way in which it’s different from the typical english speaker.
Maybe. Although as I pointed out above in my response to dirk, that seems less likely to be true in some of the cases in TMAY, eg the training data surely contains many examples of people being both risk-loving and risk-averse, so I suspect that isn’t the whole story.
Sure, yeah, I’m totally on board with that, especially in light of the ‘Tell Me About Yourself’ work (which is absolutely fascinating). I was mostly just listing it for completeness there.
Maybe. Although as I pointed out above in my response to dirk, that seems less likely to be true in some of the cases in TMAY, eg the training data surely contains many examples of people being both risk-loving and risk-averse, so I suspect that isn’t the whole story.