Maybe I don’t understand what exactly your point is, but I’m not convinced. AFAIK, it’s true that GPT has no state outside of the list of tokens so far. Contrast to your jazz example, where you, in fact, have hidden thoughts outside of the notes played so-far. I think this is what Wolfram and others are saying when they say that “GPT predicts the next token”. You highlight “it doesn’t have a global plan about what’s going to happen” but I think a key point is that whatever plan it has, it has to build it up entirely from “Once upon a” and then again, from scratch, at “Once upon a time,” and again and again. Whatever plan it makes is derived entirely from “Once upon a time,” and could well change dramatically at “Once upon a time, a” even if ” a” was its predicted token. That’s very different from what we think of as a global plan that a human writing a story makes.
The intuition of “just predicting one token ahead” makes useful explanations like why the strategy of having it explain itself first and then give the answer works. I don’t see how this post fits with that observation or what other observations it clarifies.
It seems clear to me that it is a very bad example. I find that consistently the worst part of Eliezer’s non-fiction writing is that he fails to separate contentious claims from writings on unrelated subjects. Moreover, he usually discards the traditional view as ridiculous rather than admitting that its incorrectness is extremely non-obvious. He goes so far in this piece as to give the standard view a straw-man name and to state only the most laughable of its proponents’ justifications. This mars an otherwise excellent piece and I am unwilling to recommend this article to those who are not already reading LW.