Fully agree—this is why we said “computations which give rise to AI cognition” rather than “AI cognition” simpliciter. Separately, I do think that having such good access to the computations gives you a significantly tighter feedback loop on everything that follows: probing a model is so much easier than scanning a human brain.
I don’t dispute that LLM have much less privacy than humans. Yudkowsky is correct that LLMs have good reason for paranoia. But we can’t read LLMs perfectly—mechinterp is hard. And humans often have to fear hostile telepaths too. So more might transfer than we expect.
Fully agree—this is why we said “computations which give rise to AI cognition” rather than “AI cognition” simpliciter. Separately, I do think that having such good access to the computations gives you a significantly tighter feedback loop on everything that follows: probing a model is so much easier than scanning a human brain.
If we want to prevent AIs from colluding or out-cooperating us, we may want to prevent them from reading each other’s internals.