A Simple Alignment Typology

I set out to review the OpenAI alignment plan, and my brain at some point diverged to modeling the humans behind the arguments instead of the actual arguments.

So behold! A simplified, first-pass Alignment Typology.

Why can’t we all just get agree?

There are a lot of disagreements in AI alignment. Some people don’t see the problem, some think we’ll be fine, some think we’re doomed, and then different clusters of people have different ideas on how we should go about solving alignment. Thus I tried to sketch out my understanding of the key differences between the largest clusters of views on AI alignment. What emerged are roughly five cluster, sorted in order of optimism about the fate of humanity: the sceptics, the humanists, the empiricists, the rationalists, and the fatalists.

Sceptics don’t expect AGI to show up in any relevant time frame.

Humanists think humanity will prevail fairly easily through coordination around alignment or just solving the problem directly.

Empiricists think the problem is hard, AGI will show up soon, and if we want to have any hope of solving it, then we need to iterate and take some necessary risk by making progress in capabilities while we go.

Rationalists think the problem is hard, AGI will show up soon, and we need to figure out as much as we can before making any capabilities progress.

Fatalists think we are doomed and we shouldn’t even try (though some are quite happy about it).

Here is a table.

Alignment Difficulty-

One of these

is low

Coordination Difficulty-highhigh-
Distance to AGIhigh-low/​medlow/​med

Closeness to AGI required

to Solve Alignment


Closeness to AGI resulting

in unacceptable danger


Alignment Necessary

or Possible


Less Wrong is mostly populated by empiricists and rationalists. They agree alignment is a problem that can and should be solved. The key disagreement is on the methodology. While empiricists lean more heavily on gathering data and iterating solutions, rationalists lean more heavily toward discovering theories and proofs to lower risk from AGI (and some people are a mix of the two). Just by shifting the weights of risk/​reward on iteration and moving forward, you get two opposite approaches to doing alignment work.

How is this useful?

Personally it helps me quickly get an idea of what clusters people are in, and understanding the likely arguments for their conclusions. However, a counterargument can be made that this just feeds into stereotyping and creating schisms, and I can’t be sure that’s untrue.

What do you think?