Four questions I ask AI safety researchers

Over the last few months, I’ve been trying to develop a stronger inside view on AI safety research agendas.

As part of this quest, I’ve been having conversations with AI safety researchers. I notice myself often asking the following questions:

  1. What are you working on, and how does it help us get aligned AI?

  2. Imagine you’re talking to a smart high school student. How would you describe the alignment problem? And how would you describe how your work addresses it?

  3. Imagine that you came up with a solution to the specific problem you’re working on. Or even more boldly, imagine your entire program of research succeeds. What happens next? Concretely, how does this help us get aligned AI (and prevent unaligned AI)?

  4. What are the qualities you look for in promising AI safety researchers? (beyond general intelligence)

I find Question #1 useful for starting the conversation and introducing me to the person’s worldview.

I find Question #2 useful for getting clearer (and often more detailed) explanations of the person’s understanding of the alignment problem & how their work fits in. (Note that this is somewhat redundant with question #1, but I find that the questions often yield different answers).

I find Question #3 useful for starting to develop an inside view on the research agenda & the person’s reasoning abilities. (Note that the point is to see how the person reasons about the question. I’m not looking for a “right answer”, but I am looking to see that someone has seriously thought about this and has reasonable takes.)

I find Question #4 useful for building stronger models of AI safety community-building.

I also ask a lot of follow-up questions. But I find that these are the four questions that I ask nearly everyone, and I think they’ve helped me develop a stronger inside view.

If I had to pick one question, I would probably ask #3. I think it’s a rather challenging question, and I’m generally impressed when people have coherent answers.

No comments.