I ran ~500 experiments on vocabulary-activation correspondence in self-referential processing. Paper and data on Zenodo: https://zenodo.org/records/18568344.
Figure in the header shows a key result. Happy to discuss
I am seeking arXiv endorsement (cs.AI) - DM if you can help
Models know what they are doing (but it’s hidden)
I ran ~500 experiments on vocabulary-activation correspondence in self-referential processing. Paper and data on Zenodo: https://zenodo.org/records/18568344.
Figure in the header shows a key result. Happy to discuss
I am seeking arXiv endorsement (cs.AI) - DM if you can help