I am interested in having my own opinion about more of the key disagreements within the AI alignment field, such as whether there is a basin of attraction for corrigibility, whether there is a theory of rationality that is sufficiently precise to build hierarchies of abstraction, and to what extent there will be a competence gap.
In “Is That Your True Rejection?”, Eliezer Yudkowsky wrote:
I suspect that, in general, if two rationalists set out to resolve a disagreement that persisted past the first exchange, they should expect to find that the true sources of the disagreement are either hard to communicate, or hard to expose. E.g.:
Uncommon, but well-supported, scientific knowledge or math;
Long inferential distances;
Hard-to-verbalize intuitions, perhaps stemming from specific visualizations;
Zeitgeists inherited from a profession (that may have good reason for it);
Patterns perceptually recognized from experience;
Sheer habits of thought;
Emotional commitments to believing in a particular outcome;
Fear that a past mistake could be disproved;
Deep self-deception for the sake of pride or other personal benefits.
I am assuming that something like this is happening in the key disagreements in AI alignment. The last three bullet points are somewhat uncharitable to proponents of a particular view, and also seem less likely to me. Summarizing the first six bullet points, I want to say something like: some combination of “innate intuitions” and “life experiences” led e.g. Eliezer and Paul Christiano to arrive at different opinions. I want to go through a useful subset of the “life experiences” part, so that I can share some of the same intuitions.
To that end, my question is something like: What fields should I learn? What textbooks/textbook chapters/papers/articles should I read? What historical examples (from history of AI/ML or from the world at large) should I spend time thinking about? (The more specific the resource, the better.) What intuitions should I expect to build by going through this resource? In the question title I am using the word “exercise” pretty broadly.
If you believe one just needs to be born with one set of intuitions rather than another, and that there are no resources I can consume to refine my intuitions, then my question is instead more like: How can I better introspect so as to find out which side I am on :)?
Some ideas I am aware of:
Reading discussions between Eliezer/Paul/other people: I’ve already done a lot of this; it just feels like now I am no longer making much progress.
Learn more theoretical computer science to learn the Search For Solutions And Fundamental Obstructions intuition, as mentioned in this post.