We Choose To Align AI

Epistemic status: poetry

“We choose to go to the moon! We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard. Because that goal will serve to organize the best of our skills and energies. Because that challenge is one we are willing to accept, one we are unwilling to postpone, and one we intend to win.”—John F Kennedy

WE CHOOSE TO ALIGN AI IN THIS DECADE AND DO THE OTHER THINGS

JFK gave his “We choose to go to the moon!” speech in 1962. And when he said “in this decade”, he did not mean that we’d go to the moon before 1972. He meant we’d go to the moon before 1970.

Happy 2022! When I say we choose to align AI in this decade, I don’t mean before 2032. I mean before 2030. Maybe sooner if things go well. Do I think that’s actually doable? Yes. Also fuck you.

… and some other things! As long as we’re shooting for the metaphorical moon, might as well throw aging in the mix too. That seems doable by 2030.

NOT BECAUSE THEY ARE EASY, BUT BECAUSE THEY ARE HARD

Effective altruists talk a lot about “importance, neglectedness, and tractability”. The more important, neglected, and tractable a problem is, the more we should expect a high impact per unit of effort invested in it. The alignment problem scores through the roof on importance, and is still relatively neglected, but tractability is… um… not.

I’m not really an EA, at heart. When there’s low hanging fruit, I might pick it quickly and move on, or these days I might point it out to someone else and move on. Point is, the low hanging fruit is not what I’m here for. I’m here for the challenge. I study alignment and agency and the other things not because they are easy, but because they are hard.

The more EAs I meet, the more I realize that wanting the challenge is a load-bearing pillar of sanity when working on alignment.

When people first seriously think about alignment, a majority freak out. Existential threats are terrifying. And when people first seriously look at their own capabilities, or the capabilities of the world, to deal with the problem, a majority despair. This is not one of those things where someone says “terrible things will happen, but we have a solution ready to go, all we need is your help!”. Terrible things will happen, we don’t have a solution ready to go, and even figuring out how to help is a nontrivial problem. When people really come to grips with that, tears are a common response.

… but for someone who wants the challenge, the emotional response is different. The problem is terrifying? Our current capabilities seem woefully inadequate? Good; this problem is worthy. The part of me which looks at a rickety ladder 30 feet down into a dark tunnel and says “let’s go!” wants this. The part of me which looks at a cliff face with no clear path up and cracks its knuckles wants this. The part of me which looks at a problem with no clear solution and smiles wants this. The response isn’t tears, it’s “let’s fucking do this”.

BECAUSE THAT GOAL WILL SERVE TO ORGANIZE THE BEST OF OUR SKILLS AND ENERGIES

Why align an AI, rather than prove the Riemann hypothesis? Or calculate bits of Chaitin’s constant—we know that’s hard.

When faced with a hard problem, there’s this tendency to substitute easier problems, solve those instead, and call it progress. Riemann hypothesis is too hard, so we pick some other function which looks kinda similar, and prove things about it instead. And sometimes that is progress! But other times, people just end up goodhearting on the new problem instead.

Alignment is a problem which needs to be solved. One day, reality will test us, and if we fail then it’s game over. Substitute an easier problem instead, and reality will ignore our easier solution and wipe us all out anyway.

That’s a core part of the appeal: we don’t have the option of just walking away, we don’t have the option of solving some easier problem instead.

(We still look for shortcuts and loopholes, of course. Those who despair look for shortcuts and loopholes because they want some hope to cling to. Those who seek challenge look for shortcuts and loopholes because if the problem does turn out to be easy, we want to solve it and move on.)

The alignment problem will serve to organize the best of our skills and energies because we can’t just substitute some other problem. It is a Schelling point in problem space, a problem around which I can organize my efforts and expect others to do the same, without everyone spontaneously sliding off to some other problem.

BECAUSE THAT CHALLENGE IS ONE WE ARE WILLING TO ACCEPT

Damn straight.

ONE WE ARE UNWILLING TO POSTPONE

Did I mention we’re on a timer, and we’re not sure when it will run out?

AND ONE WE INTEND TO WIN