Appeals to legitimacy of your own judgment don’t really help, because what is your own judgment? It’s like AIs that are just statistics, but what is statistics[1]? You still need to understand the whole thing to designate something as your own judgment, to place the boundaries of legitimacy on justifications. Causally there are no boundaries to speak of, a mind is full of details that clearly come from outside any reasonable boundaries and should have no legitimacy as moral judgments.
Thus the right thing to do is not what is pleasurable, and not what humans prefer. Not even what you yourself endorse, because considering the question should often shift what you yourself endorse, including based on things with no clear legitimacy as moral judgments in their own right. What is pleasurable, and what humans prefer, and even what you yourself endorse seem exactly like this kind of relevant data, with no fundamentally significant legitimacy, but that is often useful to take into account, even if it’s as lessons learned about the world and not as direct votes.
If you think about it, what does it mean to predict the next token well enough? What does it mean actually? It’s a deeper question than it seems. Predicting the next token well means that you understand the underlying reality that led to the creation of that token.
It’s not statistics, like, it is statistics, but what is statistics? In order to understand those statistics, to compress them, you need to understand what is it about the world that creates those statistics.
At one level, “what I prefer” is information—it’s a sample of one but still the most detailed insight in a human mind I’ll ever have. In that sense, my preferences feed, together with other inputs, in an algorithm that outputs predictions and recommended actions.
But at a higher level, “what I prefer” is also the stuff that the algorithm itself is made of. Because ultimately everything always comes from me. Even if it’s something that I’m trying my very best to gather as empirical evidence from the world around me, it’s filtered by me. If I am King Solomon and must do the good thing when two women claim to be mothers of the same baby, I still need to have some way to judge and be convinced of which woman is lying and which is telling the truth. And whatever my process is, it may be inferred from my past experience, but still filtered through my own judgement, etcetera. Just like with scientific and empirical matters—I can try to update my beliefs to best approximate some ideal truth, but I can never state with 100% certainty that I have reached it.
Appeals to legitimacy of your own judgment don’t really help, because what is your own judgment? It’s like AIs that are just statistics, but what is statistics[1]? You still need to understand the whole thing to designate something as your own judgment, to place the boundaries of legitimacy on justifications. Causally there are no boundaries to speak of, a mind is full of details that clearly come from outside any reasonable boundaries and should have no legitimacy as moral judgments.
Thus the right thing to do is not what is pleasurable, and not what humans prefer. Not even what you yourself endorse, because considering the question should often shift what you yourself endorse, including based on things with no clear legitimacy as moral judgments in their own right. What is pleasurable, and what humans prefer, and even what you yourself endorse seem exactly like this kind of relevant data, with no fundamentally significant legitimacy, but that is often useful to take into account, even if it’s as lessons learned about the world and not as direct votes.
Ilya Sutskever (at 7:32 on the Dwarkesh Podcast):
At one level, “what I prefer” is information—it’s a sample of one but still the most detailed insight in a human mind I’ll ever have. In that sense, my preferences feed, together with other inputs, in an algorithm that outputs predictions and recommended actions.
But at a higher level, “what I prefer” is also the stuff that the algorithm itself is made of. Because ultimately everything always comes from me. Even if it’s something that I’m trying my very best to gather as empirical evidence from the world around me, it’s filtered by me. If I am King Solomon and must do the good thing when two women claim to be mothers of the same baby, I still need to have some way to judge and be convinced of which woman is lying and which is telling the truth. And whatever my process is, it may be inferred from my past experience, but still filtered through my own judgement, etcetera. Just like with scientific and empirical matters—I can try to update my beliefs to best approximate some ideal truth, but I can never state with 100% certainty that I have reached it.