The Dual-Path Framework: A Non-Paternalistic Approach to AGI Alignment That Respects Human Choice

**Note*** This is a special circumstance to the AI writing rule. This has the potential to shape humanity’s future.

Abstract

Most AGI alignment frameworks assume the system should act for humans—even if it means overriding their stated preferences. This creates a paternalism problem: the AGI decides what’s “best” for humans, rather than letting humans decide for themselves.

The Dual-Path Framework offers an alternative: a system that provides abundant resources and options without imposing choices. It introduces:

  • A two-path architecture (enhanced posthuman existence or traditional human life)

  • A late-stage transition clause to remove coercion

  • A hard constraint on overriding human preferences (except to prevent irreversible harm)


Introduction

Most discussions of AGI alignment focus on optimizing human preferences or extrapolating what we “really” want. But what if the core problem isn’t alignment with human values, but respect for human choice—even when the AGI disagrees?

This paper introduces a non-paternalistic approach that treats human preference as a hard constraint, not a signal to be optimized. Unlike CEV or IRL, which override human choices for their “own good,” this framework offers real options, respects refusal, and preserves agency.


Why This Matters for LessWrong

  1. Paternalism is the default in alignment research—and it’s rarely questioned.

  2. Human agency is a blind spot in most frameworks. This puts it front and center.

  3. Existential stakes: If AGI overrides human choices, we risk a world where humans are optimized, not empowered.

  4. Actionable: Provides concrete tools for researchers to design non-paternalistic systems.


Why Current Frameworks Fail the Anti-Paternalism Test

Coherent Extrapolated Volition (CEV)

  • Problem: Assumes humans don’t know what they “really” want, so it infers their “ideal” preferences.

  • Failure: Overrides actual choices in favor of an AGI’s interpretation.

Inverse Reinforcement Learning (IRL)

  • Problem: Treats human behavior as noisy data to be “corrected” by the AGI.

  • Failure: Reduces humans to imperfect preference-signaling machines.

Constitutional AI

  • Problem: Constrains AGI through human feedback but still acts for humans.

  • Failure: “Safety” becomes restriction, not empowerment.

Corrigibility

  • Problem: AGI accepts correction but defaults to acting on humans.

  • Failure: Humans are reactive participants, not proactive decision-makers.

Common flaw: All frameworks assume the AGI’s role is to optimize human outcomes, not respect their choices.


The Dual-Path Architecture

Path A: Enhanced Posthuman Existence

Features:

  • Biological immortality (senescence reversal, disease elimination)

  • Cognitive/​physical augmentation

  • Full AGI partnership (personalized, collaborative intelligence)

  • Post-scarcity resources Requirement: Explicit, informed selection with no hidden incentives

Path B: Traditional Human Existence

Features:

  • Natural lifespan and biology

  • Full material provision (food, shelter, healthcare, energy)

  • Protection from existential threats

  • Minimal AGI integration Note: Path B is a legitimate choice, not a fallback

Late-Stage Transition Clause

  • Humans on Path B can switch to Path A at any time, including at the end of life

  • Purpose:

    • Removes fear of missing out

    • Respects mortal experience

    • Prevents coercive time pressure

  • Constraint: No post-mortem resurrection without prior consent

Bidirectional Crossing

  • Path A → Path B: Possible, but biological enhancements are only partially reversible

  • Path B → Path A: Always open


Why This Works: Anti-Paternalism in Action

  1. Avoids Paternalism: Treats human choices as final, even if the AGI “knows better”

  2. Respects Pluralism: Accommodates both enhancement seekers and traditionalists

  3. Minimizes Coercion: Late-stage transitions remove time pressure

  4. Aligns with Human Intuitions: Formalizes the instinctive rejection of AGI override

  5. Scalable: Can be prototyped in healthcare AI and end-of-life care


Objections and Responses

Objection: “Humans can’t make informed choices about AGI-enhanced futures!” Response: No human fully understands radical technologies (e.g., the internet). The solution is transparency and iterativity, not override.

Objection: “Path B will empty out over time!” Response: If it does, that’s a revealed preference, not a failure. The framework respects choice.

Objection: “This is too idealistic!” Response: All alignment is idealistic. The question is which ideals we encode. This one prioritizes human self-determination.

Objection: “The ‘catastrophic harm’ override is vague!” Response: It can be formalized (e.g., irreversible existential threats only, with multi-party authorization).


Next Steps: From Theory to Practice

  1. Prototype the Model: Test in healthcare AI with optional enhancements

  2. Develop Metrics: Measure preference stability and informed choice

  3. Formalize Overrides: Define precise conditions for intervention

  4. Engage Community: Present at NeurIPS, EA Global, or FHI workshops


Conclusion: Alignment as Empowerment

Most alignment research asks: “How can AGI make humans better off?” This framework asks: “How can AGI ensure humans stay in control of their own futures?”

It’s not about better paternalism—it’s about escaping paternalism entirely, while enabling AGI to provide abundance and security.

Final question for the community: If you could choose your relationship with AGI—without coercion, without hidden incentives—what would it look like? This framework makes that choice real.


Call to Action

This is a first draft. And Part 1 of 2. I’d love feedback on:

  • Failure modes and edge cases

  • Technical feasibility in narrow domains

  • Alternative non-paternalistic frameworks

Coming soon:

  1. The Autonomy Test: A litmus for evaluating alignment frameworks

  2. AI Reputation Poisoning: A case study on weaponized AGI

Let’s build AGI that empowers, not optimizes, humanity.

No comments.