I am interested in having my own opinion about more of the key disagreements within the AI alignment field, such as whether there is a basin of attraction for corrigibility, whether there is a theory of rationality that is sufficiently precise to build hierarchies of abstraction, and to what extent there will be a competence gap.

In “Is That Your True Rejection?”, Eliezer Yudkowsky wrote:

I suspect that, in general, if two rationalists set out to resolve a disagreement that persisted past the first exchange, they should expect to find that the true sources of the disagreement are either hard to communicate, or hard to expose. E.g.:

Uncommon, but well-supported, scientific knowledge or math;
Long inferential distances;
Hard-to-verbalize intuitions, perhaps stemming from specific visualizations;
Zeitgeists inherited from a profession (that may have good reason for it);
Patterns perceptually recognized from experience;
Sheer habits of thought;
Emotional commitments to believing in a particular outcome;
Fear that a past mistake could be disproved;
Deep self-deception for the sake of pride or other personal benefits.

I am assuming that something like this is happening in the key disagreements in AI alignment. The last three bullet points are somewhat uncharitable to proponents of a particular view, and also seem less likely to me. Summarizing the first six bullet points, I want to say something like: some combination of “innate intuitions” and “life experiences” led e.g. Eliezer and Paul Christiano to arrive at different opinions. I want to go through a useful subset of the “life experiences” part, so that I can share some of the same intuitions.

To that end, my question is something like: What fields should I learn? What textbooks/textbook chapters/papers/articles should I read? What historical examples (from history of AI/ML or from the world at large) should I spend time thinking about? (The more specific the resource, the better.) What intuitions should I expect to build by going through this resource? In the question title I am using the word “exercise” pretty broadly.

If you believe one just needs to be born with one set of intuitions rather than another, and that there are no resources I can consume to refine my intuitions, then my question is instead more like: How can I better introspect so as to find out which side I am on :)?

Some ideas I am aware of:

Reading discussions between Eliezer/Paul/other people: I’ve already done a lot of this; it just feels like now I am no longer making much progress.
Learn more theoretical computer science to learn the Search For Solutions And Fundamental Obstructions intuition, as mentioned in this post.

[Question] What are some exercises for building/generating intuitions about key disagreements in AI alignment?

riceissa16 Mar 2020 7:41 UTC

LW: 18 AF: 7

2 comments1 min readLW link

Intuition

What links here?

Source code size vs learned model size in ML and in humans? by riceissa (20 May 2020 8:47 UTC; 11 points)

romeostevensit 16 Mar 2020 8:06 UTC
LW: 4 AF: 2
AF
If you want to investigate the intuitions themselves e.g. what is generating differing intuitions between researchers, I’d pay attention to which metaphors are being used in the reference class tennis when reading the existing debates.
- riceissa 16 Mar 2020 23:52 UTC
  LW: 3 AF: 2
  AF Parent
  I have only a very vague idea of what you mean. Could you give an example of how one would do this?

No comments.

[Question] What are some exercises for building/​generating intuitions about key disagreements in AI alignment?

[Question] What are some exercises for building/generating intuitions about key disagreements in AI alignment?