Hagen B

Karma: 3

Hagen B 4 Jun 2026 0:17 UTC
1 point
0
on: Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
This was excellently written and helpful. I am developing a series of posts on a framing problem which underlies current alignment research, and my point connects to what you are observing here. (First part here). I would be very interested in your input on those posts. That said, those posts are not intended to look directly at the “messed-up ontology” you describe concerning empowerment, agency, manipulation, corrigibility, and so on, and I want to comment on this, because I think you are making an important observation.
Free Will
The framework you are responding to treats free will as acausal self-origination, and this is, I agree, a folk picture that does not survive scrutiny.
Another way to think of the will is as an appetite for what the person perceives to be desirable. On this account, the will is not acausal, but driven by what the reason perceives to be good or desirable. What the reason perceives to be good or desirable is informed by experience, education, culture, careful reflection, and so on.
This is not libertarian free will or acting as an uncaused causer, but neither does it deny a robust sense of agency. It preserves genuine agency by locating it in the structure of rational action rather than in a break in the causal order. The agent is responsible for what they do because the action is theirs, mediated through their apprehension, desires, and action, not because the action was uncaused. And they are a moral agent insofar as their reason is correctly identifying the good (this is being developed more fully in the series I mentioned).
Free will on this account alters the picture considerably.
Empowerment
In the model you describe, empowerment is “related to someone’s acausal free will being able to accomplish whatever it wants to accomplish”.
Consider two examples to illustrate the difference: In the first case, Steve wants to play the cello, and he wants to play it however he pleases. In order to accomplish this, he purchases a cello. He proceeds to sit and hammer on the out-of-tune strings with the back of his bow, placing his fingers wherever the mood strikes him. He is empowered, by virtue of having purchased the cello, to do as he wants with it.
On the other hand, consider Susan, who also desires to play the cello, but she wants to master the works of Bach. In order to accomplish this, she spends many years practicing, developing proper form, learning to read the music, and eventually she gains a fluency with the instrument that allows her to master Bach’s compositions. She is empowered by her developed capability to accomplish what she wants to accomplish.
Empowerment on the framework you described looks more like Steve, asserting his right to do as he pleases without direction or purpose.
Empowerment on this other model looks like Susan, and means a greater capacity to act in accordance with what her reason apprehends as good. To increase someone’s empowerment is to increase their capacity for clear apprehension and right action, not their leverage to get whatever they happen to want.
Manipulation and Counsel
Manipulation, similarly, takes on a different form. In the model you described, manipulation comes down to whose acausal will is served. Bob’s deception is manipulative because it leads you to serve his desires rather than your own.
On this model, manipulation is action that leads another away from accurate apprehension of what is the case, typically through deception or through appeals that bypass reason, in order to get them to act in accordance with something other than the good correctly apprehended by reason. On this account, it is possible to lead someone to act in accordance with my desires or in opposition to their initial desires without manipulation. A parent who attempts to convince their child to prioritize education over drugs is not manipulating them, but attempting to correct their misguided will by reorienting their reason to the good.
What distinguishes counsel from manipulation on this model is not a matter of whose will is served or what “vibes” they get, but whether the action is attempting to help them reason more clearly about the good, or misdirecting them from it.
Obedience
The difference between how these two frameworks conceive of obedience similarly diverges.
On the model you critiqued, obedience is about whose agency is increased and whose is decreased. To be obedient is to have your agency decreased in subservience to the agency of the one empowered.
Obedience, on this other view, is formative and grounded in humility rather than deference to power. It is not a good in itself, but is only a good insofar as it leads to greater apprehension of what is the case, and is thereby instructive. Obedience is good for a child, a mentee, a student, insofar as it makes one open to being guided by those who see more.
Obedience on the model you are critiquing looks like the parent who says “do what I say because I say so”. Obedience on this other model looks like a trainer who says “let me show you what I have learned”.
Conclusion
In every case, the folk picture you are critiquing reduces agency to a power struggle between competitive wills that can only be imposed from outside. It is a useful critique of external alignment.
But on this other model, these categories are a matter of formation and internal orientation. Or, if you will, internal alignment to what is good as it can be discovered by reason.

Hagen B 2 Jun 2026 23:17 UTC
3 points
0
in reply to: Gordon Seidoh Worley’s comment on: Alignment to What?
You are correctly following my core diagnostic claim. Where I would clarify is the second paragraph, and the question as you posed it.
“Values” in common parlance refers to something like human preference, or subjectively ascertained ideals. Framing this in terms of “values which can be aligned to” risks smuggling in the very framework I am challenging.
Rather, I would say that there is a conception of human flourishing which is normative, discoverable through rational inquiry, and is independent of values or cultural preferences. You could almost summarize the issue by saying “the problem with aligning AI to human preferences in order to make it safe and useful (I.e. conducive to human flourishing) is that human preferences are themselves not properly aligned to human flourishing in this deeper sense which is discoverable through rational inquiry, such as the empirical sciences. Human biology, psychology, sociology, and history all bear on what helps people flourish and what harms them. What is good for humans — physical and cognitive health, social bonds, meaningful work, capacity for self-direction, opportunities for relationships and learning — are not arbitrary cultural preferences. These are facts about the kind of being a human is, and they are discovered through investigation.
We are trying to align the system to subjects who are themselves misaligned/capable of misalignment to the objective (read: subject-independent) realities which constitute human flourishing.
Alignment, on this view, is not optimization-against-specified-values. It would be better described as the system’s orientation toward what is the case, including the case about what is good for humans.
Although subtle, the difference is important: a value-target picture requires us to enumerate and rank human goods in a way which faces all the familiar problems (the goods are plural, the rankings are contested, the specification can be gamed). The orientation picture asks instead that the system “apprehend” what is the case with regard to what human beings are and what is constitutive of their flourishing. The difference is analogous to following a rule because you were told to versus acting consistently with the rule because you understand the underlying rationale.
So the framework affirms that human flourishing is scientifically and rationally investigable, while denying that the result of the investigation is a specification to align to.
You correctly anticipated that Part Two is going to look at this more closely. This is the major part of the constructive work Part Two is intended to develop.

Hagen B 2 Jun 2026 12:49 UTC
1 point
0
in reply to: Gordon Seidoh Worley’s comment on: Alignment to What?
Thank you for this substantive, albeit concise response. Concerning your first question, the problems current alignment approaches have been trying to resolve are byproducts of the subjective frameworks used to ground normativity. Alignment to “values” raises the question of whose values. Alignment to human preferences raises questions about whose preferences, which ones, and at what time, because the preferences of subjects are fluid and variable. These sorts of problems are not merely technical difficulties, but problems created by the framework.
The solution is a different ground that is more stable, universal, and open to discovery by anyone willing to reason carefully about what is the case.
On your second question, I’d push back on the suggestion that no knowledge is properly objective in the sense required here. The discoveries of mathematics and the sciences are good counterexamples. Objectivity in this sense does not require final, infallible knowledge — it’s a matter of being answerable to something independent of the inquirer. We can be wrong about physics; we can also be wrong about what human flourishing consists in. However, the wrongness is wrongness because there is something we are wrong about.
Although our access is mediated, fallible, revisable, and developing, what we are tracking is not constituted by our tracking of it. That distinction is what “objective” needs to do in this context, and I think it’s available without requiring more than the sciences themselves already presuppose.

Hagen B 2 Jun 2026 4:20 UTC
1 point
0
in reply to: Karl Krueger’s comment on: Alignment to What?
Thanks for engaging.
You’re right that biological functions are multiple and overlapping. I’m not sure what I said that leads you to think I was asserting otherwise, but both statements can be true: that the heart is only one part of the circulatory system, and that it is a part of the circulatory system with the function of circulating blood. The load-bearing claim is not that biology is made up of things with only one function, but rather that there is normativity in nature that is not a matter of human preference or dictate. In fact, your examples insofar as they are true, underscore this. If biology is genuinely made up of things with multiple and overlapping roles which constitute the health of (read: what is good for) the organism and which are about things (such as social bonds), then the claim that normativity is imposed on nature by agents needs further justification, and an explanation needs to be offered for why these facts of biology do not establish the kind normativity in nature that my thesis implies.
On neurodiversity specifically, I think the point cuts the other way from how you’ve used it. Natural law accommodates variation within natures. Human beings vary in temperament, cognitive style, physical capability, social orientation, and many other respects, but the various expressions of human nature are not violations of human nature. A central component of my underlying metaphysics (although the thesis has been intentionally expressed in a way which permits recognition without commitment to my metaphysics) is the idea of potentialities: that alongside the way things are, there are truths about how they could be. Reality is not exhausted by being and non-being, there is also a middleground of potential that allows for unrealized capacities and variations within a nature.

Alignment to What?

Hagen B1 Jun 2026 21:08 UTC

2 points

6 comments12 min readLW link

Hagen B

Align­ment to What?

Alignment to What?