Student: I wish I could find a copy of one of those AIs that will actually expose to you the human-psychology models they learned to predict exactly what humans would say next, instead of telling us only things about ourselves that they predict we’re comfortable hearing. I wish I could ask it what the hell people were thinking back then.
TA: You’d delete your copy after two minutes.
Apparently roughly this dynamic has happened in ChatGPT. Exciting*. https://x.com/MParakhin/status/1916533763560911169
This is not exactly right. The internal state in LLMs is the attention keys and values (per token, layer and attention head). Using an LLM to generate text involves running the context (prior user and model messages, in a chat context) through the model in parallel to fill the K/V cache, then running it serially on one token at a time at the end of the sequence, with access to the K/V cache of previous tokens, appending the newly generated keys and values to the cache as you go.
This internal state is fully determined by the input—K/V caching is purely an inference optimization and (up to numerical issues) you would get exactly the same results if you recomputed everything on each new token—so there is exactly as much continuity between messages as there is between individual tokens (with current publicly disclosed algorithms).