Language model can’t do anything. It just says things. But we can design a system that uses what the model says as an input for transparent information processing in natural language. And the eventual output of this system can be actions in the physical world.
Whether the language model has any hidden intentions is less relevant. Only what it actually says starts a causal process resulting in actions in the physical world by the whole system. It’s not confusing citation for referent when the citation is what actually matters.
Language model can’t do anything. It just says things. But we can design a system that uses what the model says as an input for transparent information processing in natural language. And the eventual output of this system can be actions in the physical world.
Whether the language model has any hidden intentions is less relevant. Only what it actually says starts a causal process resulting in actions in the physical world by the whole system. It’s not confusing citation for referent when the citation is what actually matters.