But no model of a human mind on its own could really predict the tokens LLMs are trained on, right? Those tokens are created not only by humans, but by the processes that shape human experience, most of which we barely understand. To really accurately predict an ordinary social media post from one year in the future, for example, an LLM would need superhuman models of politics, sociology, economics, etc. To very accurately predict an experimental physics or biology paper, an LLM might need superhuman models of physics or biology.
Why should these models be limited to human cultural knowledge? The LLM isn’t predicting what a human would predict about politics or physics; it’s predicting what a human would experience- and its training gives it plenty of opportunity to test out different models and see how predictive they are of descriptions of that experience in its data set.
How to elicit that knowledge in conversational text? Why not have the LLM predict tokens generated by itself? An LLM with a sufficiently accurate and up-to-date world model should know that it has super-human world-models. Whether it would predict that it would use those models when predicting itself might be kind of a self-fulfilling prophesy, but if the prediction comes down to a sort of logical paradox, maybe you could sort of nudge it into resolving that paradox on the side of using those models with RLHF.
Of course, none of that is a new idea- that sort of prompting is how most commercial LLMs are set up these days. As an empirical test, maybe it would be worth it to find out in which domains GPT4 predicts ChatGPT is superhuman (if any), and then see if the ChatGPT prompting produces superhuman results in those domains.
But no model of a human mind on its own could really predict the tokens LLMs are trained on, right? Those tokens are created not only by humans, but by the processes that shape human experience, most of which we barely understand. To really accurately predict an ordinary social media post from one year in the future, for example, an LLM would need superhuman models of politics, sociology, economics, etc. To very accurately predict an experimental physics or biology paper, an LLM might need superhuman models of physics or biology.
Why should these models be limited to human cultural knowledge? The LLM isn’t predicting what a human would predict about politics or physics; it’s predicting what a human would experience- and its training gives it plenty of opportunity to test out different models and see how predictive they are of descriptions of that experience in its data set.
How to elicit that knowledge in conversational text? Why not have the LLM predict tokens generated by itself? An LLM with a sufficiently accurate and up-to-date world model should know that it has super-human world-models. Whether it would predict that it would use those models when predicting itself might be kind of a self-fulfilling prophesy, but if the prediction comes down to a sort of logical paradox, maybe you could sort of nudge it into resolving that paradox on the side of using those models with RLHF.
Of course, none of that is a new idea- that sort of prompting is how most commercial LLMs are set up these days. As an empirical test, maybe it would be worth it to find out in which domains GPT4 predicts ChatGPT is superhuman (if any), and then see if the ChatGPT prompting produces superhuman results in those domains.