[Question] [timeboxed exercise] write me your model of AI human-existential safety and the alignment problems in 15 minutes


I had a 15 minute interview last night in which I was asked “why do you believe in xrisk, and what does AI have to do with it?” I thought it was too big a question for a 15 minute interview, but nevertheless dove in to my inside view from first principles. Since diving into one’s inside view from first principles even outside of a fifteen minute interview is really hard, I did a bad job, mostly rambled and babbled.

A broader motivation is that I’m interested in studying peoples’ inside views /​ gears-level models as to hone my own.


In this exercise, you’re allowed premises just try to point at them. It is not a “from first principles” sort of exercise. You’re also allowed jargon without being too worried about how well the audience knows jargon (for example, in mine which I’ll paste below I assume familiarity with the single & multi quadrants from ARCHES).

The only real rule is to limit yourself to 15 minutes. That’s fifteen minutes wall time, with a literal clock.

Suggestion: don’t read until you write!

No comments.