Daniel Kokotajlo comments on Procedurally evaluating factual accuracy: a request for research

Daniel Kokotajlo 30 Mar 2022 17:14 UTC
LW: 18 AF: 11
0
AF
Thanks for writing this, I’m excited to see more work on this subject!
One minor musing: I think the problem is a bit more dire than the framing “who to align to” suggests. Humans are biased, including us, including me. A system which replicates those biases and tells us/me what we would have concluded if we investigated in our usual biased way… is “aligned” in some sense, but in a very important sense is unaligned.* To use Ajeya’s metaphor, it’s a sycophant, not a saint. Rather than assisting us to find the truth, it’ll assist us in becoming more unreasonably overconfident and self-assured in the ideology we already endorsed.
One reason I’m excited about research in this area is that hopefully we’ll be able to collect data from a wide range of different political perspectives and diverse kinds of people, so that we can make political affiliation one of the variables the user can choose—that way users can see how the bot’s answers differ depending on which bias it has. I expect this to be pretty helpful in a variety of ways.
*A provocative way of putting it that I nevertheless tentatively endorse: It’s aligned to your current ideology, not to you.