Would it useful to think about (pre-trained) LLMs as approximating wave function collapse algorithm? (the one from game dev, not quantum stuff)
Logits as partially solved constraints after finite compute budget and output is mostly-random-but-weighted-towards-most-likely sample without actually collapsing it fully and without backtracking and each node is evaluated to random level of precision—basically a somewhat stupid way how to sample from that data structure if you don’t follow it by fixing the violated constraints and only keep the first pass of a quick heuristic, there will be incompatible nodes next to each other… as in hallucinations and harmful mixing of programming paradigms in the same codebase and 80%-good-enough stuff that could not possibly be precise in edge cases.
And stuff like RLHF or RLVR will still only improve the first pass heuristic, not actually fix the inconsistencies … “agentic” scaffolds for coding assistants with multiple passes and running the linters and tests and multiple rounds of “does it make sense” sound like they should be helpful, but doing it in tokens instead of logits (where the actual contraints live before collapsing them to quasi-random instantiated sample) sounds ..inefficient?
Would it useful to think about (pre-trained) LLMs as approximating wave function collapse algorithm? (the one from game dev, not quantum stuff)
Logits as partially solved constraints after finite compute budget and output is mostly-random-but-weighted-towards-most-likely sample without actually collapsing it fully and without backtracking and each node is evaluated to random level of precision—basically a somewhat stupid way how to sample from that data structure if you don’t follow it by fixing the violated constraints and only keep the first pass of a quick heuristic, there will be incompatible nodes next to each other… as in hallucinations and harmful mixing of programming paradigms in the same codebase and 80%-good-enough stuff that could not possibly be precise in edge cases.
And stuff like RLHF or RLVR will still only improve the first pass heuristic, not actually fix the inconsistencies … “agentic” scaffolds for coding assistants with multiple passes and running the linters and tests and multiple rounds of “does it make sense” sound like they should be helpful, but doing it in tokens instead of logits (where the actual contraints live before collapsing them to quasi-random instantiated sample) sounds ..inefficient?