Ape in the coat comments on Symbol/Referent Confusions in Language Model Alignment Experiments

Ape in the coat 27 Oct 2023 15:37 UTC
12 points
6
Language model can’t do anything. It just says things. But we can design a system that uses what the model says as an input for transparent information processing in natural language. And the eventual output of this system can be actions in the physical world.

Whether the language model has any hidden intentions is less relevant. Only what it actually says starts a causal process resulting in actions in the physical world by the whole system. It’s not confusing citation for referent when the citation is what actually matters.