Why “Solving Alignment” Is Likely a Category Mistake

A common framing of the AI alignment problem is that it’s a technical hurdle to be overcome. A clever team at DeepMind or Anthropic would publish a paper titled “Alignment is All You Need,” everyone would implement it, and we’d all live happily ever after in harmonious coexistence with our artificial friends.

I suspect this perspective constitutes a category mistake on multiple levels. Firstly, it presupposes that the aims, drives, and objectives of both the artificial general intelligence and what we aim to align it with can be simplified into a distinct and finite set of elements, a simplification I believe is unrealistic. Secondly, it treats both the AGI and the alignment target as if they were static systems. This is akin to expecting a single paper titled “The Solution to Geopolitical Stability” or “How to Achieve Permanent Marital Bliss.” These are not problems that are solved; they are conditions that are managed, maintained, and negotiated on an ongoing basis.

The Problem of “Aligned To Whom?”

The phrase “AI alignment” is often used as shorthand for “AI that does what we want.” But “we” is not a monolithic entity. Consider the potential candidates for the entity or values an AGI should be aligned with:

Individual Users: Aligning to individual user preferences, however harmful or conflicting? This seems like a recipe for chaos or enabling malicious actors. We just experienced an example of how this can go wrong with the GPT-4o update and subsequent rollback, wherein positive user feedback resulted in an overly sycophantic model personality (that was still rated quite highly by many users!) with some serious negative consequences.
Corporate/Shareholder Interests: Optimizes for proxy goals (e.g., profit, engagement) which predictably generate negative externalities. Subject to Goodhart’s Law on a massive scale.
Democratic Consensus: Aligning to the will of the majority? Historical precedent suggests this can lead to the oppression of minorities. Furthermore, democratic processes are slow, easily manipulated, and often struggle with complex, long-term issues.
AI Developer Values: Aligning to the personal values of a small, unrepresentative group of engineers and researchers? This introduces the biases and blind spots of that specific group as the de facto global operating principles. We saw how this can go wrong with the Twitter Files—imagine if that was instead about controlling the values of AGI.
Objective Morality/Coherent Extrapolated Volition: Assumes such concepts are well-defined, discoverable, and technically specifiable—all highly uncertain propositions that humanity has failed on thus far. And if we’re relying on AGI to figure this one out, I’m not sure how we could “check the proof” on this one, so we’d have to assume that the AGI was aligned…and we’re right back where we started.

This isn’t merely a matter of picking the “right” option. The options conflict, and the very notion of a stable, universally agreed-upon target for alignment seems implausible a priori.

The Target is Moving

The second aspect of the category mistake is treating alignment as something you achieve rather than something you maintain. Consider these analogous complex systems:

Parenting: No parent ever “solves” child-rearing. Any parent who tells you they have is either a) lying, b) deeply deluded, or c) has a remarkably compliant houseplant they’re confusing with a human child. A well-behaved twelve-year-old might be secretly clearing their browsing history to hide their real internet behavior (definitely not anecdotal). The child who dutifully attends synagogue might become an atheist in college. Parents are constantly recalibrating, responding to new developments, and hoping their underlying values transmission works better than their specific behavioral controls.
Marriage: About 40% of marriages end in divorce, and many more persist in mutual dissatisfaction. Even healthy relationships require constant maintenance and realignment as people change. The version of your spouse you married may share only partial values-overlap with the person they become twenty years later. Successful marriages don’t “solve” alignment; they continually renegotiate it, reorienting their goals and expectations as life inevitably reshapes their circumstances. Alignment in marriage is a verb, not a noun—a continuous process rather than a stable state.
Geopolitics: Nations form alliances when convenient and break them when priorities change. No nation has ever “solved” international relations because the system has too many moving parts and evolving agents. The only examples of permanent, stable alliances (Andorra/Spain/France, Bhutan/India, Monaco/France) are fascinating precisely because they are outliers, and notably involve entities with vastly asymmetrical power dynamics or unique geographic constraints. It’s nice to think that maybe humanity would be the France to AGI’s Monaco…but if anything the inverse seems more likely, with humanity being forcibly aligned to AGI.
Economics: Despite centuries of economic theory, no country has built an economy immune to recessions, inflation, or inequality. Central banks and treasury departments engage in constant intervention and adjustment, not one-time fixes. Nobody has a “solution” to the economy in the way you have a solution to 2+2=4. It’s an ongoing process of tinkering, observing, and occasionally just bracing for impact.

These examples illustrate what Dan Hendrycks (drawing on Rittel & Webber’s 1973 work) has identified as the “wicked problem” nature of AI safety: problems that are “open-ended, carry ambiguous requirements, and often produce unintended consequences.” Artificial general intelligence belongs squarely in this category of problems that resist permanent solutions.

The scale of the challenge with AGI is amplified by the potential power differential. I struggle to keep my ten-year-olds aligned with my values, and I’m considerably smarter and more powerful than they are. With AGI we’re talking about creating intelligent, agentic systems, but unlike children they will be smarter, think faster, and be more numerous than us. We will change, they will change, the environment will change. Maintaining alignment will be a continuous, dynamic process.

This doesn’t mean we should abandon alignment research. We absolutely need the best alignment techniques possible. But we should be clear-eyed about what success looks like: not a solved problem, but an ongoing, never-ending process of negotiation, adaptation, and correction. Perhaps given the misleading nature of the current nomenclature, using a different phrase such as Successfully Navigating AI Co-evolution would better capture the dynamic, relational, and inherently unpredictable nature of integrating AGI successfully with humanity.

This also provides an argument against defining alignment as following a person’s desires instead of an ethos or worldview. If OpenBrain leaders want the AI to create the Deep Utopia, while some human researchers convince the AI to adopt another policy compatible with humanity’s interests and to align all future AIs to the policy, then the AI is misaligned from OpenBrain’s POV, but not from the POV of those who don’t endorse the Deep Utopia.

The most extreme example of such relations is chatbot romance that is actually likely to harm the society.