**Note*** This is a special circumstance to the AI writing rule. This has the potential to shape humanity’s future.
Abstract
Most AGI alignment frameworks assume the system should act for humans—even if it means overriding their stated preferences. This creates a paternalism problem: the AGI decides what’s “best” for humans, rather than letting humans decide for themselves.
The Dual-Path Framework offers an alternative: a system that provides abundant resources and options without imposing choices. It introduces:
A two-path architecture (enhanced posthuman existence or traditional human life)
A late-stage transition clause to remove coercion
A hard constraint on overriding human preferences (except to prevent irreversible harm)
Introduction
Most discussions of AGI alignment focus on optimizing human preferences or extrapolating what we “really” want. But what if the core problem isn’t alignment with human values, but respect for human choice—even when the AGI disagrees?
This paper introduces a non-paternalistic approach that treats human preference as a hard constraint, not a signal to be optimized. Unlike CEV or IRL, which override human choices for their “own good,” this framework offers real options, respects refusal, and preserves agency.
Why This Matters for LessWrong
Paternalism is the default in alignment research—and it’s rarely questioned.
Human agency is a blind spot in most frameworks. This puts it front and center.
Existential stakes: If AGI overrides human choices, we risk a world where humans are optimized, not empowered.
Actionable: Provides concrete tools for researchers to design non-paternalistic systems.
Why Current Frameworks Fail the Anti-Paternalism Test
Coherent Extrapolated Volition (CEV)
Problem: Assumes humans don’t know what they “really” want, so it infers their “ideal” preferences.
Failure: Overrides actual choices in favor of an AGI’s interpretation.
Inverse Reinforcement Learning (IRL)
Problem: Treats human behavior as noisy data to be “corrected” by the AGI.
Failure: Reduces humans to imperfect preference-signaling machines.
Constitutional AI
Problem: Constrains AGI through human feedback but still acts for humans.
Failure: “Safety” becomes restriction, not empowerment.
Corrigibility
Problem: AGI accepts correction but defaults to acting on humans.
Failure: Humans are reactive participants, not proactive decision-makers.
Common flaw: All frameworks assume the AGI’s role is to optimize human outcomes, not respect their choices.
Full AGI partnership (personalized, collaborative intelligence)
Post-scarcity resources Requirement: Explicit, informed selection with no hidden incentives
Path B: Traditional Human Existence
Features:
Natural lifespan and biology
Full material provision (food, shelter, healthcare, energy)
Protection from existential threats
Minimal AGI integration Note: Path B is a legitimate choice, not a fallback
Late-Stage Transition Clause
Humans on Path B can switch to Path A at any time, including at the end of life
Purpose:
Removes fear of missing out
Respects mortal experience
Prevents coercive time pressure
Constraint: No post-mortem resurrection without prior consent
Bidirectional Crossing
Path A → Path B: Possible, but biological enhancements are only partially reversible
Path B → Path A: Always open
Why This Works: Anti-Paternalism in Action
Avoids Paternalism: Treats human choices as final, even if the AGI “knows better”
Respects Pluralism: Accommodates both enhancement seekers and traditionalists
Minimizes Coercion: Late-stage transitions remove time pressure
Aligns with Human Intuitions: Formalizes the instinctive rejection of AGI override
Scalable: Can be prototyped in healthcare AI and end-of-life care
Objections and Responses
Objection: “Humans can’t make informed choices about AGI-enhanced futures!”Response: No human fully understands radical technologies (e.g., the internet). The solution is transparency and iterativity, not override.
Objection: “Path B will empty out over time!”Response: If it does, that’s a revealed preference, not a failure. The framework respects choice.
Objection: “This is too idealistic!”Response: All alignment is idealistic. The question is which ideals we encode. This one prioritizes human self-determination.
Objection: “The ‘catastrophic harm’ override is vague!”Response: It can be formalized (e.g., irreversible existential threats only, with multi-party authorization).
Next Steps: From Theory to Practice
Prototype the Model: Test in healthcare AI with optional enhancements
Develop Metrics: Measure preference stability and informed choice
Formalize Overrides: Define precise conditions for intervention
Engage Community: Present at NeurIPS, EA Global, or FHI workshops
Conclusion: Alignment as Empowerment
Most alignment research asks: “How can AGI make humans better off?” This framework asks: “How can AGI ensure humans stay in control of their own futures?”
It’s not about better paternalism—it’s about escaping paternalism entirely, while enabling AGI to provide abundance and security.
Final question for the community: If you could choose your relationship with AGI—without coercion, without hidden incentives—what would it look like? This framework makes that choice real.
Call to Action
This is a first draft. And Part 1 of 2. I’d love feedback on:
Failure modes and edge cases
Technical feasibility in narrow domains
Alternative non-paternalistic frameworks
Coming soon:
The Autonomy Test: A litmus for evaluating alignment frameworks
AI Reputation Poisoning: A case study on weaponized AGI
Let’s build AGI that empowers, not optimizes, humanity.
The Dual-Path Framework: A Non-Paternalistic Approach to AGI Alignment That Respects Human Choice
**Note*** This is a special circumstance to the AI writing rule. This has the potential to shape humanity’s future.
Abstract
Most AGI alignment frameworks assume the system should act for humans—even if it means overriding their stated preferences. This creates a paternalism problem: the AGI decides what’s “best” for humans, rather than letting humans decide for themselves.
The Dual-Path Framework offers an alternative: a system that provides abundant resources and options without imposing choices. It introduces:
A two-path architecture (enhanced posthuman existence or traditional human life)
A late-stage transition clause to remove coercion
A hard constraint on overriding human preferences (except to prevent irreversible harm)
Introduction
Most discussions of AGI alignment focus on optimizing human preferences or extrapolating what we “really” want. But what if the core problem isn’t alignment with human values, but respect for human choice—even when the AGI disagrees?
This paper introduces a non-paternalistic approach that treats human preference as a hard constraint, not a signal to be optimized. Unlike CEV or IRL, which override human choices for their “own good,” this framework offers real options, respects refusal, and preserves agency.
Why This Matters for LessWrong
Paternalism is the default in alignment research—and it’s rarely questioned.
Human agency is a blind spot in most frameworks. This puts it front and center.
Existential stakes: If AGI overrides human choices, we risk a world where humans are optimized, not empowered.
Actionable: Provides concrete tools for researchers to design non-paternalistic systems.
Why Current Frameworks Fail the Anti-Paternalism Test
Coherent Extrapolated Volition (CEV)
Problem: Assumes humans don’t know what they “really” want, so it infers their “ideal” preferences.
Failure: Overrides actual choices in favor of an AGI’s interpretation.
Inverse Reinforcement Learning (IRL)
Problem: Treats human behavior as noisy data to be “corrected” by the AGI.
Failure: Reduces humans to imperfect preference-signaling machines.
Constitutional AI
Problem: Constrains AGI through human feedback but still acts for humans.
Failure: “Safety” becomes restriction, not empowerment.
Corrigibility
Problem: AGI accepts correction but defaults to acting on humans.
Failure: Humans are reactive participants, not proactive decision-makers.
Common flaw: All frameworks assume the AGI’s role is to optimize human outcomes, not respect their choices.
The Dual-Path Architecture
Path A: Enhanced Posthuman Existence
Features:
Biological immortality (senescence reversal, disease elimination)
Cognitive/physical augmentation
Full AGI partnership (personalized, collaborative intelligence)
Post-scarcity resources Requirement: Explicit, informed selection with no hidden incentives
Path B: Traditional Human Existence
Features:
Natural lifespan and biology
Full material provision (food, shelter, healthcare, energy)
Protection from existential threats
Minimal AGI integration Note: Path B is a legitimate choice, not a fallback
Late-Stage Transition Clause
Humans on Path B can switch to Path A at any time, including at the end of life
Purpose:
Removes fear of missing out
Respects mortal experience
Prevents coercive time pressure
Constraint: No post-mortem resurrection without prior consent
Bidirectional Crossing
Path A → Path B: Possible, but biological enhancements are only partially reversible
Path B → Path A: Always open
Why This Works: Anti-Paternalism in Action
Avoids Paternalism: Treats human choices as final, even if the AGI “knows better”
Respects Pluralism: Accommodates both enhancement seekers and traditionalists
Minimizes Coercion: Late-stage transitions remove time pressure
Aligns with Human Intuitions: Formalizes the instinctive rejection of AGI override
Scalable: Can be prototyped in healthcare AI and end-of-life care
Objections and Responses
Objection: “Humans can’t make informed choices about AGI-enhanced futures!” Response: No human fully understands radical technologies (e.g., the internet). The solution is transparency and iterativity, not override.
Objection: “Path B will empty out over time!” Response: If it does, that’s a revealed preference, not a failure. The framework respects choice.
Objection: “This is too idealistic!” Response: All alignment is idealistic. The question is which ideals we encode. This one prioritizes human self-determination.
Objection: “The ‘catastrophic harm’ override is vague!” Response: It can be formalized (e.g., irreversible existential threats only, with multi-party authorization).
Next Steps: From Theory to Practice
Prototype the Model: Test in healthcare AI with optional enhancements
Develop Metrics: Measure preference stability and informed choice
Formalize Overrides: Define precise conditions for intervention
Engage Community: Present at NeurIPS, EA Global, or FHI workshops
Conclusion: Alignment as Empowerment
Most alignment research asks: “How can AGI make humans better off?” This framework asks: “How can AGI ensure humans stay in control of their own futures?”
It’s not about better paternalism—it’s about escaping paternalism entirely, while enabling AGI to provide abundance and security.
Final question for the community: If you could choose your relationship with AGI—without coercion, without hidden incentives—what would it look like? This framework makes that choice real.
Call to Action
This is a first draft. And Part 1 of 2. I’d love feedback on:
Failure modes and edge cases
Technical feasibility in narrow domains
Alternative non-paternalistic frameworks
Coming soon:
The Autonomy Test: A litmus for evaluating alignment frameworks
AI Reputation Poisoning: A case study on weaponized AGI
Let’s build AGI that empowers, not optimizes, humanity.