A lot of the data is actually the same, but a lot of it is actually different! Sure, chemistry works the same on Earth and Qo’noS. But in addition to vinegar and baking soda, Earth is full of humans doing human things, and Qo’noS is full of Klingons doing Klingon things.
If you want to predict how a human would respond to a moral dilemma, the English LLM can predict that, because the simplest program (with respect to the neural network prior) that predicts English webtext needs to be able predict human moral judgements. The Klingon LLM can’t; it doesn’t know anything about humans.
To be sure, the prediction about the human’s choice is, in terms of agent foundations theory, “prediction” and not “steering”. The LLM doesn’t autonomously want to do the right thing. With the right prompt, it could just as easily predict what fictional Romulans would do (because webtext contains a lot of fiction about Romulans) or the results of chemistry reactions (because there’s a lot of webtext about chemistry).
But predictions can be used for steering. With careful prompting or reinforcement learning, the English LLM can respond to a description of a moral dilemma with a pretty good prediction of how a human would respond to the dilemma, and the text can be used to trigger actions in the world, for example, via a CLI interface. That’s real steering (the CLI command executed depends on the dilemma by means of the prediction about the human’s response) that the Klingon LLM can’t do.
I think it’s possible for both LLM Psychology / Simulator Theory and the Platonic Representation Hypothesis to be true, and that we just need to apply both of them judiciously. The Platonic Representation Hypothesis also applies to humans, and your hypothetical aliens, so they’re not actually opposed.
A lot of the data is actually the same, but a lot of it is actually different! Sure, chemistry works the same on Earth and Qo’noS. But in addition to vinegar and baking soda, Earth is full of humans doing human things, and Qo’noS is full of Klingons doing Klingon things.
If you want to predict how a human would respond to a moral dilemma, the English LLM can predict that, because the simplest program (with respect to the neural network prior) that predicts English webtext needs to be able predict human moral judgements. The Klingon LLM can’t; it doesn’t know anything about humans.
To be sure, the prediction about the human’s choice is, in terms of agent foundations theory, “prediction” and not “steering”. The LLM doesn’t autonomously want to do the right thing. With the right prompt, it could just as easily predict what fictional Romulans would do (because webtext contains a lot of fiction about Romulans) or the results of chemistry reactions (because there’s a lot of webtext about chemistry).
But predictions can be used for steering. With careful prompting or reinforcement learning, the English LLM can respond to a description of a moral dilemma with a pretty good prediction of how a human would respond to the dilemma, and the text can be used to trigger actions in the world, for example, via a CLI interface. That’s real steering (the CLI command executed depends on the dilemma by means of the prediction about the human’s response) that the Klingon LLM can’t do.
I think it’s possible for both LLM Psychology / Simulator Theory and the Platonic Representation Hypothesis to be true, and that we just need to apply both of them judiciously. The Platonic Representation Hypothesis also applies to humans, and your hypothetical aliens, so they’re not actually opposed.