Why “AI alignment” would better be renamed into “Artificial Intention research”

“AI alignment” has the application, the agenda, less charitably the activism, right in the name. It is a lot like “Missiology” (the study of how to proselytize to “the savages”) which had to evolve into “Anthropology” in order to get atheists and Jews to participate. In the same way, “AI Alignment” excludes e.g. people who are inclined to believe superintelligences will know better than us what is good, and who don’t want to hamstring them. You can think we’re well rid of these people. But you’re still excluding people and thereby reducing the amount of thinking that will be applied to the problem.

“Artificial Intention research” instead emphasizes the space of possible intentions, the space of possible minds, and stresses how intentions that are not natural (constrained by evolution) will be different and weird.

And obviously “Artificial Intention” is an alliteration and a close parallel with “Artificial Intelligence”, so it is very catchy. Catchiness matters a lot when you want an idea to catch on at scale!

Extremely superficially, it doesn’t sound “tacked on” to Artificial Intelligence research, it sounds like a logical completion.

The necessity of alignment doesn’t have to be in the name, because it logically follows from the focus on intention, with this very simple argument:

  • Intention doesn’t have to be conscious or communicable. It is just a preference for some futures over others, inferred as an explanation for behavior that chooses some future over others. Like, even single celled organisms have basic intentions if they move towards nutrients or away from bad temperatures.

  • Therefore, anything that selectively acts in the world, including AI systems, can be modeled to have some intent that explains its behavior.

  • So you’re always going to get an intent, and if you don’t design it thoughtfully you’ll get an essentially random one.

  • ...which is most likely bad (e.g. the paperclip maximizer) because it is random and different and weird.

So this would continue to be useful for alignment. Just like anthropology continued to be useful, and in fact was even more useful than original missiology, to the missionaries.

Having the Intelligence (the I in “AI Alignment”) only implicitly part of it (because of the alliteration and the close parallel) might lose some of the focus on how the Intelligence makes the Intention much more relevant? If that isn’t obvious enough? But it also allows us to also look at Grey Goo scenarios, another existential risk worth preventing.

Changing names will cause confusion, which is bad. But the shift from “friendly AI” to “AI alignment” went fine, because “AI alignment” just is a better name than “friendly AI”. I imagine there wouldn’t be much more trouble in a shift to an even better one. After all, “human-compatible” seems to be doing fine as well.

What do you think?