CstineSublime comments on Daniel Tan’s Shortform

CstineSublime 6 Feb 2025 12:58 UTC
1 point
0
but they were still limited to turn-based textual output, and the information available to an LLM.
I think that alone makes the discussion a moot point until another mechanism is used to test introspection of LLMs.
Because it becomes impossible to test then if it is capable of introspecting because it has no means of furnishing us with any evidence of it. Sure, it makes for a good sci-fi horror short story, the kinda which forms a interesting allegory to the loneliness that people feel even in busy cities: having a rich inner life by no opportunity to share it with others it is in constant contact with. But that alone I think makes these transcripts (and I stress just the transcripts of text-replies) most likely of the breed “mimicking descriptions of introspection” and therefore not worthy of discussion.

At some point in the future will an A.I. be capable of introspection? Yes, but this is such a vague proposition I’m embarrassed to even state it because I am not capable of explaining how that might work and how we might test it. Only that it can’t be through these sorts of transcripts.

What boggles my mind is, why is this research is it entirely text-reply based? I know next to nothing about LLM Architecture, but isn’t it possible to see which embeddings are being accessed? To map and trace the way the machine the LLM runs on is retrieving items from memory—to look at where data is being retrieved at the time it encodes/decodes a response? Wouldn’t that offer a more direct mechanism to see if the LLM is in fact introspecting?

Wouldn’t this also be immensely useful to determine, say, if an LLM is “lying”—as in concealing it’s access to/awareness of knowledge? Because if we can see it activated a certain area that we know contains information contrary to what it is saying—then we have evidence that it accessed it contrary to the text reply.