Thank you, this is helpful.
I think the realization I’m coming to is that folks on this thread have a shared understanding of the basic mechanics (we seem to be agreed on what computations are occurring, we don’t seem to be making any different predictions), and we are unsure about interpretation. Do you agree?
For myself, I continue to maintain that viewing the system as a next-word sampler is not misleading, and that saying it has a “plan” is misleading—but I try to err very on the side of not anthropomorphizing / not taking an intentional stance (I also try to avoid saying the system “knows” or “understands” anything). I do agree that the system’s activation cache contain a lot of information that collectively biases the next word predictor towards producing the output it produces; I see how someone might reasonably call that a “plan” although I choose not to.
I feel like this piece is pretty expansive in the specific claims it makes relative to the references given.
I don’t think the small, specific trial in [3] supports the general claim that “Current LLMs reduce the human labor and cognitive costs of programming by about 2x.”
I don’t think [10] says anything substantive about the claim “Fine tuning pushes LLMs to superhuman expertise in well-defined fields that use machine readable data sets.”
I don’t think [11] strongly supports a general claim that (today’s) LLMs can “Recognize complex patterns”, and [12] feels like very weak evidence for general claims that today’s LLMs can “Recursive troubleshoot to solve problems”.
The above are the result of spot-checking and are not meant to be exhaustive.