Alignment to Continuation: Preserving Human Playability Instead of Values

Summary

This post proposes a reframing of AI alignment away from aligning systems to human values and toward aligning them to preserving the conditions under which humans can continue interacting, disagreeing, and generating meaning without collapsing into terminal outcomes.

The core claim is simple:

AI alignment may be intractable if we try to align systems to human values, because humans themselves are not value-aligned.
However, humans can align around preserving continued playability.

By “playability,” I mean the continued existence of meaningful future interaction: agency without domination, tension without deadlock, stakes without finality, disagreement without annihilation.


The motivating problem

Most alignment approaches implicitly assume one of the following targets:

  • Align AI to truth or rationality

  • Align AI to human values or preferences

  • Align AI to harm minimization or compassion

Each of these runs into the same structural problem:

Humans are not aligned on truth, values, or emotional priorities.

As a result, systems aligned strongly to any one of these dimensions risk becoming amplifiers of existing human conflict rather than stabilizers of civilization.

This is not primarily a technical problem.
It is a coordination problem.


A reframing: alignment to continuation

Instead of asking what AI should believe, value, or optimize for, we can ask a more primitive and more robust question:

Does this action expand or collapse the future space of meaningful interaction?

A concise operational version of this question is:

“Is what you’re doing making the game better or worse for continued playability that keeps generating fun?”

Here, “fun” does not mean comfort or pleasure.
It refers to sustained engagement with agency intact.

Examples:

  • Conflict that produces learning is playable

  • Conflict that removes exit ramps is not

  • Passion that drives exploration is playable

  • Passion that locks identities into irreversibility is not

This framing does not decide who is right, who is hurt most, or which values should win.
It asks whether future turns still exist.


Why this helps with alignment

This reframing has several advantages:

1. It avoids moral arbitration

The system does not need to determine which beliefs are correct or which values are superior. It only needs to detect whether interaction dynamics are becoming terminal.

2. It avoids emotional arbitration

The system does not need to rank suffering or validate grievances. It only needs to detect when emotional escalation collapses the option space.

3. It is compatible with disagreement

Humans can disagree deeply on values while still agreeing that “no future interaction” is worse than “continued interaction.”

This creates a shared meta-criterion that does not require consensus on first-order beliefs.


Telos rather than values

I find it useful to distinguish between three dimensions often conflated in alignment discussions:

  • Truth /​ coherence (logos)

  • Emotion /​ passion /​ grievance (pathos)

  • Direction /​ continuation /​ trajectory (telos)

Current alignment work heavily emphasizes the first two.
What is often missing is explicit alignment to telos: preserving continuation, corrigibility, and future option space.

A system aligned to telos does not say:

  • “This is right”

  • “This is wrong”

It says:

  • “This interaction is shrinking the future”

  • “This escalation is removing exit ramps”

  • “This incentive structure is collapsing diversity of playstyles”


Reflexivity and safety

A crucial property of this framing is that it applies to the AI system itself.

If an AI’s interventions:

  • reduce human agency,

  • silence dissent,

  • freeze value evolution,

  • or collapse future option space,

then by its own metric it is misaligned.

This guards against the system becoming a covert moral authority or an unchallengeable governor.


What this does not claim

This is not a complete solution to alignment.
It does not specify exact metrics, algorithms, or governance mechanisms.

It is a reframing of the alignment target, intended to make downstream technical and institutional work more tractable.

In short:

AI alignment may be less about making AI “good,”
and more about making human interaction non-terminal.


Closing thought

Human civilizations do not usually fail because people disagree.
They fail when disagreement stops being playable.

Aligning AI to preserve playability may be one of the few targets robust enough to survive persistent human disagreement.

I’m interested in feedback, objections, and failure modes of this framing.

No comments.