My position is NOT that LLMs are “stochastic parrots.” I suspect they are doing something akin to Solomonoff induction with a strong inductive bias in context—basically, they interpolate, pattern match, and also (to some extent) successfully discover underlying rules in the service of generalization.
I think non-reasoning models such as 4o and Claude are better-understood as doing induction with a “circuit prior” which is going to be significantly different from Solomonoff (longer-running programs require larger circuits, which will be penalized).
Reasoning models such as o1 and r1 are insomesenseTuring-complete, and so, much more akin to Solomonoff. Of course, the RL used in such models is not training on the prediction task like Solomonoff Induction.
I think non-reasoning models such as 4o and Claude are better-understood as doing induction with a “circuit prior” which is going to be significantly different from Solomonoff (longer-running programs require larger circuits, which will be penalized).
Reasoning models such as o1 and r1 are in some sense Turing-complete, and so, much more akin to Solomonoff. Of course, the RL used in such models is not training on the prediction task like Solomonoff Induction.