Why “AI alignment” would better be renamed into “Artificial Intention research”

chaosmage15 Jun 2023 10:32 UTC

29 points

Outer Alignment Inner Alignment Intentionality AI

“AI alignment” has the application, the agenda, less charitably the activism, right in the name. It is a lot like “Missiology” (the study of how to proselytize to “the savages”) which had to evolve into “Anthropology” in order to get atheists and Jews to participate. In the same way, “AI Alignment” excludes e.g. people who are inclined to believe superintelligences will know better than us what is good, and who don’t want to hamstring them. You can think we’re well rid of these people. But you’re still excluding people and thereby reducing the amount of thinking that will be applied to the problem.

“Artificial Intention research” instead emphasizes the space of possible intentions, the space of possible minds, and stresses how intentions that are not natural (constrained by evolution) will be different and weird.

And obviously “Artificial Intention” is an alliteration and a close parallel with “Artificial Intelligence”, so it is very catchy. Catchiness matters a lot when you want an idea to catch on at scale!

Extremely superficially, it doesn’t sound “tacked on” to Artificial Intelligence research, it sounds like a logical completion.

The necessity of alignment doesn’t have to be in the name, because it logically follows from the focus on intention, with this very simple argument:

Intention doesn’t have to be conscious or communicable. It is just a preference for some futures over others, inferred as an explanation for behavior that chooses some future over others. Like, even single celled organisms have basic intentions if they move towards nutrients or away from bad temperatures.
Therefore, anything that selectively acts in the world, including AI systems, can be modeled to have some intent that explains its behavior.
So you’re always going to get an intent, and if you don’t design it thoughtfully you’ll get an essentially random one.
...which is most likely bad (e.g. the paperclip maximizer) because it is random and different and weird.

So this would continue to be useful for alignment. Just like anthropology continued to be useful, and in fact was even more useful than original missiology, to the missionaries.

Having the Intelligence (the I in “AI Alignment”) only implicitly part of it (because of the alliteration and the close parallel) might lose some of the focus on how the Intelligence makes the Intention much more relevant? If that isn’t obvious enough? But it also allows us to also look at Grey Goo scenarios, another existential risk worth preventing.

Changing names will cause confusion, which is bad. But the shift from “friendly AI” to “AI alignment” went fine, because “AI alignment” just is a better name than “friendly AI”. I imagine there wouldn’t be much more trouble in a shift to an even better one. After all, “human-compatible” seems to be doing fine as well.

What do you think?

chaosmage15 Jun 2023 10:32 UTC

29 points

12 comments2 min readLW link

Outer Alignment Inner Alignment Intentionality AI

johnswentworth 15 Jun 2023 15:53 UTC
13 points
3
Major problem with that particular name: in philosophy, “intention” means something completely different from the standard use. From SEP:
In philosophy, intentionality is the power of minds and mental states to be about, to represent, or to stand for, things, properties and states of affairs. To say of an individual’s mental states that they have intentionality is to say that they are mental representations or that they have contents.
So e.g. Dennett’s “intentional stance” does not mean what you probably thought it did, if you’ve heard of it! (I personally learned of this just recently, thankyou Steve Peterson.)
- chaosmage 15 Jun 2023 16:26 UTC
  13 points
  5
  Parent
  I fail to see how that’s a problem.
- Caspar Oesterheld 15 Jun 2023 19:21 UTC
  4 points
  2
  Parent
  Do philosophers commonly use the word “intention” to refer to mental states that have intentionality, though? For example, from the SEP article on intentionality:
  
  >intention and intending are specific states of mind that, unlike beliefs, judgments, hopes, desires or fears, play a distinctive role in the etiology of actions. By contrast, intentionality is a pervasive feature of many different mental states: beliefs, hopes, judgments, intentions, love and hatred all exhibit intentionality.
  (This is specifically where it talks about how intentionality and the colloquial meaning of intention must not be confused, though.)
  
  Ctrl+f-ing through the SEP article gives only one mention of “intention” that seems to refer to intentionality. (“The second horn of the same dilemma is to accept physicalism and renounce the ‘baselessness’ of the intentional idioms and the ‘emptiness’ of a science of intention.”) The other few mentions of “intention” seem to talk about the colloquial meaning. The article seems to generally avoid the avoid “intention”. Generally the article uses “intentional” and “intentionality”.
  Incidentally, there’s also an SEP article on “intention” that does seem to be about what one would think it to be about. (E.g., the first sentence of that article: “Philosophical perplexity about intention begins with its appearance in three guises: intention for the future, as I intend to complete this entry by the end of the month; the intention with which someone acts, as I am typing with the further intention of writing an introductory sentence; and intentional action, as in the fact that I am typing these words intentionally.”)
  So as long as we don’t call it “artificial intentionality research” we might avoid trouble with the philosophers after all. I suppose the word “intentional” becomes ambiguous, however. (It is used >100 times in both SEP articles.)
faul_sname 15 Jun 2023 17:22 UTC
8 points
4
I think the issues you point out with the “alignment” name are real issues. That said, the word “intent” comes with its own issues.

Intention doesn’t have to be conscious or communicable. It is just a preference for some futures over others, inferred as an explanation for behavior that chooses some future over others. Like, even single celled organisms have basic intentions if they move towards nutrients or away from bad temperatures.

I don’t think “intention” is necessarily the best word for this unless you go full POSIWID. A goose does not “intend” to drag all vaguely egg-shaped objects to her nest and sit on them, in the sense that I don’t think geese prefer sitting on a clutch of eggs over a clutch of eggs, a lightbulb, and a wooden block. And yet that is the expressed behavior anyway, because lightbulbs were rare and eggs that rolled out of the nest common in the ancestral environment.

I think “artificial system fitness-for-purpose” might come closer to gesturing about what “AI alignment” is pointing at (including being explicit about the bit that it’s a 2-place term), but at the cost of being extremely not catchy.
- Seth Herd 15 Jun 2023 18:07 UTC
  6 points
  0
  Parent
  I agree that intention comes with its own baggage, but I think that baggage is mostly appropriate. Intention usually refers to explicit goals. And those are the ones we’re mostly worried about. I think it’s unhelpful tomix concerns about goal-directed AI with concerns about implicit biases and accidental side effects. So I’d call this another step in the right direction.
  
  I am going to try adopting this terminology, at least in some cases.
  - faul_sname 15 Jun 2023 20:22 UTC
    11 points
    4
    Parent
    So the idea is to use “Artificial Intention” to specifically speak of the subset of concerns about what outcomes an artificial system will try to steer for, rather than the concerns about the world-states that will result in practice from the interaction of that artificial system’s steering plus the steering of everything else in the world?
    
    Makes sense. I expect it’s valuable to also have a term for the bit where you can end up in a situation that nobody was steering for due to the interaction of multiple systems, but explicitly separating those concerns is probably a good idea.
Épiphanie Gédéon 15 Jun 2023 13:19 UTC
7 points
0
Reposting (after a slight rewrite) from the telegram group:

This might be a nitpick, but to my (maybe misguided) understanding, alignment is only a very specific subfield of ai safety research, which basically boils down to “how do I give a set of rules/utility function/designs that avoid meta or mesa optimizations that have dramatic unforseen consequences” (This is at least how I understood MIRI’s focus pre-2020)

For instance, as I understand it, interpretability research is not directly alignment research. Instead, it is part of the broader “AI safety research” (which includes alignment research, interpretability, transparency, corrigeability, …)

With that being said, I do think that your points apply for renaming “AI safety research” to Artifical Intention Research still hold, and I would be very much in favor of it. It is more self-explanatory, catchier, does not require doom-assumptions to be worth investigating which I think matters a lot in public communication.
dr_s 16 Jun 2023 4:42 UTC
5 points
4

In the same way, “AI Alignment” excludes e.g. people who are inclined to believe superintelligences will know better than us what is good, and who don’t want to hamstring them. You can think we’re well rid of these people. But you’re still excluding people and thereby reducing the amount of thinking that will be applied to the problem.

I’m not sure what can someone who essentially thinks there is no problem contribute to its solution. That said, I get the gist of the argument and you do have a point IMO about stressing the two complementary aspects of a mind. Maybe Artificial Volition? Intention feels to me like it alliterates so much with Intelligence it circles back from catchiness to being confusing.
Jay Bailey 16 Jun 2023 2:27 UTC
3 points
2
“”AI alignment” has the application, the agenda, less charitably the activism, right in the name.”

This seems like a feature, not a bug. “AI alignment” is not a neutral idea. We’re not just researching how these models behave or how minds might be built neutrally out of pure scientific curiosity. It has a specific purpose in mind—to align AI’s. Why would we not want this agenda to be part of the name?
Archimedes 16 Jun 2023 1:56 UTC
3 points
0
“Artificial Intention” doesn’t sound catchy at all to me, but that’s just my opinion.

Personally, I prefer to think of the “Alignment Problem” more generally rather than “AI Alignment”. Regardless of who has the most power (humans, AI, cyborgs, aliens, etc.) and who has superior ethics, conflict arises when participants in a system are not all aligned.
- dr_s 16 Jun 2023 4:47 UTC
  4 points
  −2
  Parent
  I think that’s better called simply coordination or cooperation problem. Alignment has the unfortunate implication of coming off as one party wanting to forcefully change the others. With AI it’s fine because if you’re creating a mind from scratch it’d be the height of stupidity to create an enemy.
  - [ ]
    [deleted]
Seth Herd 15 Jun 2023 18:16 UTC
2 points
0
AI safety includes unintended consequences of non-sentient systems. That ambiguity creates confusion in the discussion. I’ve been using AGI x-risk as a clumsy way to point to what I’m trying to research. Artificial Intention research does the same thing, but without broadcasting conclusions as part of the endeavor.

Leaving out the “artificial intelligence” seems questionable, as does adopting the same abbreviation, “AI”, for both. So I’d suggest AI intention research, AII. Wait, nevermind :). Other ideas?