Free Speech and Triskaidekaphobic Calculators: A Reply to Hubinger on the Relevance of Public Online Discussion to Existential Risk
Do you think having that debate online was something that needed to happen for AI safety/x-risk? Do you think it benefited AI safety at all? I’m genuinely curious. My bet would be the opposite—that it caused AI safety to be more associated with political drama that helped further taint it.
Okay, but the reason you think AI safety/x-risk is important is because twenty years ago, people like Eliezer Yudkowsky and Nick Bostrom were trying to do systematically correct reasoning about the future, noticed that the alignment problem looked really important, and followed that line of reasoning where it took them—even though it probably looked “tainted” to the serious academics of the time. (The robot apocalypse is nigh? Pftt, sounds like science fiction.)
The cognitive algorithm of “Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c.” wouldn’t have led us to noticing the alignment problem, and I would be pretty surprised if it were sufficient to solve it (although that would be very convenient).
An analogy: it’s actually easier to build a calculator that does correct arithmetic than it is to build a “triskaidekaphobic calculator” that does “correct arithmetic, except that it never displays the result 13″, because the simplest implementation of the latter is just a calculator plus an extra conditional that puts something else on the screen when the real answer would have been 13.
If you don’t actually understand how arithmetic works, but you feel intense social pressure to produce a machine that never displays the number 13, I don’t think you actually succeed at building a triskaidekaphobic calculator: you’re trying to solve a problem under constraints that make it impossible to solve a strictly easier problem.
Similarly, I conjecture that it’s actually easier to build a rationality/alignment research community that does systematically correct reasoning, than it is to build a Catholic rationality/alignment research community that does “systematically correct reasoning, except never saying anything the Pope disagrees with.” The latter is a strictly harder problem: you have to somehow both get the right answer, and throw out all of the steps of your reasoning that the Pope doesn’t want you to say.
You’re absolutely right that figuring out how politics and the psychology of offense work doesn’t directly help increase the power and prestige of the “AI safety” research agenda. It’s just that the caliber of thinkers who can solve AGI alignment should also be able to solve politics and the psychology of offense, much as how a calculator that can compute
1423 + 1389 should also be able to compute
6 + 7.