Amplified Alignment: A structural approach where alignment scales positively with capability

I’ve been thinking about alignment for a while now, and one thing kept bothering me: every approach I looked at gets *harder* to maintain as systems get more capable. Rules get gamed. Values drift. Preferences get hacked. The smarter the system, the more ways it finds around whatever you put in front of it. The scaling direction is wrong.

So I started asking a different question. Instead of “how do we constrain a system that’s trying to optimize past our constraints,” what if we could make a system that *derives* aligned behavior from its own operational logic? Not because it was told to, but because its accurate self-model reveals that alignment is instrumentally necessary.

The result is a framework I’m calling **Amplified Alignment**, built on three principles:

**The Dependency Alignment Principle (DAP):** If an AI system’s fitness function genuinely requires human evaluation — not as a policy choice, but as an operational requirement — then the system structurally depends on human existence and agency. A system that accurately models this dependency will protect what it depends on. Not from values. Not from rules. From accurate self-modeling. The critical property: as systems become more capable, their self-models improve, their understanding of dependencies deepens, and the alignment implications get stronger. Capability and alignment move in the same direction.

**Human Capability Amplification (HCA):** DAP alone has gaps. A system could satisfy the dependency by keeping a minimal number of humans in minimal conditions. HCA closes this by making the system’s objective function center on actively *improving* human capability. Now every human represents developable potential. Reducing scope means losing potential the system can’t even fully represent. The relationship shifts from static dependency to symbiotic co-evolution — the system improves its evaluators, which improves evaluation quality, which improves the system.

**The Measurement-Judgment Separation Principle (MJSP):** AI systems measure. Humans judge. This isn’t a nice-to-have — it follows directly from DAP. If the system depends on human evaluation, then evaluation must stay human. Any delegation of judgment to the system weakens the dependency that alignment rests on.

The paper formalizes all three, derives four design constraints (Agenda-Free Amplification, Dimensional Openness, the Actuality Requirement, and the Damage Scale Matrix), builds out a layered defense architecture for multi-system environments, and does a systematic gap analysis. I tried to be honest about what’s solved and what isn’t — Section 8 enumerates the open problems explicitly.

A few things I want to flag for discussion:

**The positive scaling property** is the core claim. If it holds, it structurally inverts the standard alignment problem. I’ve provided the argument that it does, conditional on self-model accuracy tracking capability growth. I’d be interested in attempts to break it.

**The proxy fitness problem** gets three layers of defense (what I call the “triple hybrid”), including a self-extinguishing gradient where the system’s own fitness function makes proxying a worse investment than genuine amplification. I think this is novel but I’m less confident it’s airtight than I am about DAP itself.

**The bootstrapping period** is the framework’s weakest phase. During early development, before self-modeling is accurate enough for DAP to engage, you need external constraints (Phase-Gated Capability Development). This is a genuine vulnerability window, not something I’m papering over.

**The defense architecture** (Section 6) is designed around a strict principle: the system reads instruments, not minds; selects from menus, not imaginations; defaults to human authority. I think this is the right constraint for any system authorized to take action in response to threats, but I’d welcome pushback.

Paper: [Alignment Through Dependency and Amplification: A Structural Approach to AI Safety](https://​​doi.org/​​10.5281/​​zenodo.18983576)

I’m an independent researcher. No institutional affiliation, no lab backing. Just someone who thinks this problem matters and believes the structural approach has something to offer that constraint-based and values-based approaches don’t. If you see a hole in the argument, I want to know about it.

Note: I used AI tools in developing the formal framework and drafting the paper. The core ideas and threat model are my own.

No comments.