Models know what they are doing (but it’s hidden)

I ran ~500 experiments on vocabulary-activation correspondence in self-referential processing. Paper and data on Zenodo: https://​​zenodo.org/​​records/​​18568344.

Figure in the header shows a key result. Happy to discuss

I am seeking arXiv endorsement (cs.AI) - DM if you can help

No comments.