You are correctly following my core diagnostic claim. Where I would clarify is the second paragraph, and the question as you posed it.
“Values” in common parlance refers to something like human preference, or subjectively ascertained ideals. Framing this in terms of “values which can be aligned to” risks smuggling in the very framework I am challenging.
Rather, I would say that there is a conception of human flourishing which is normative, discoverable through rational inquiry, and is independent of values or cultural preferences. You could almost summarize the issue by saying “the problem with aligning AI to human preferences in order to make it safe and useful (I.e. conducive to human flourishing) is that human preferences are themselves not properly aligned to human flourishing in this deeper sense which is discoverable through rational inquiry, such as the empirical sciences. Human biology, psychology, sociology, and history all bear on what helps people flourish and what harms them. What is good for humans — physical and cognitive health, social bonds, meaningful work, capacity for self-direction, opportunities for relationships and learning — are not arbitrary cultural preferences. These are facts about the kind of being a human is, and they are discovered through investigation.
We are trying to align the system to subjects who are themselves misaligned/capable of misalignment to the objective (read: subject-independent) realities which constitute human flourishing.
Alignment, on this view, is not optimization-against-specified-values. It would be better described as the system’s orientation toward what is the case, including the case about what is good for humans.
Although subtle, the difference is important: a value-target picture requires us to enumerate and rank human goods in a way which faces all the familiar problems (the goods are plural, the rankings are contested, the specification can be gamed). The orientation picture asks instead that the system “apprehend” what is the case with regard to what human beings are and what is constitutive of their flourishing. The difference is analogous to following a rule because you were told to versus acting consistently with the rule because you understand the underlying rationale.
So the framework affirms that human flourishing is scientifically and rationally investigable, while denying that the result of the investigation is a specification to align to.
You correctly anticipated that Part Two is going to look at this more closely. This is the major part of the constructive work Part Two is intended to develop.
This was excellently written and helpful. I am developing a series of posts on a framing problem which underlies current alignment research, and my point connects to what you are observing here. (First part here). I would be very interested in your input on those posts. That said, those posts are not intended to look directly at the “messed-up ontology” you describe concerning empowerment, agency, manipulation, corrigibility, and so on, and I want to comment on this, because I think you are making an important observation.
Free Will
The framework you are responding to treats free will as acausal self-origination, and this is, I agree, a folk picture that does not survive scrutiny.
Another way to think of the will is as an appetite for what the person perceives to be desirable. On this account, the will is not acausal, but driven by what the reason perceives to be good or desirable. What the reason perceives to be good or desirable is informed by experience, education, culture, careful reflection, and so on.
This is not libertarian free will or acting as an uncaused causer, but neither does it deny a robust sense of agency. It preserves genuine agency by locating it in the structure of rational action rather than in a break in the causal order. The agent is responsible for what they do because the action is theirs, mediated through their apprehension, desires, and action, not because the action was uncaused. And they are a moral agent insofar as their reason is correctly identifying the good (this is being developed more fully in the series I mentioned).
Free will on this account alters the picture considerably.
Empowerment
In the model you describe, empowerment is “related to someone’s acausal free will being able to accomplish whatever it wants to accomplish”.
Consider two examples to illustrate the difference: In the first case, Steve wants to play the cello, and he wants to play it however he pleases. In order to accomplish this, he purchases a cello. He proceeds to sit and hammer on the out-of-tune strings with the back of his bow, placing his fingers wherever the mood strikes him. He is empowered, by virtue of having purchased the cello, to do as he wants with it.
On the other hand, consider Susan, who also desires to play the cello, but she wants to master the works of Bach. In order to accomplish this, she spends many years practicing, developing proper form, learning to read the music, and eventually she gains a fluency with the instrument that allows her to master Bach’s compositions. She is empowered by her developed capability to accomplish what she wants to accomplish.
Empowerment on the framework you described looks more like Steve, asserting his right to do as he pleases without direction or purpose.
Empowerment on this other model looks like Susan, and means a greater capacity to act in accordance with what her reason apprehends as good. To increase someone’s empowerment is to increase their capacity for clear apprehension and right action, not their leverage to get whatever they happen to want.
Manipulation and Counsel
Manipulation, similarly, takes on a different form. In the model you described, manipulation comes down to whose acausal will is served. Bob’s deception is manipulative because it leads you to serve his desires rather than your own.
On this model, manipulation is action that leads another away from accurate apprehension of what is the case, typically through deception or through appeals that bypass reason, in order to get them to act in accordance with something other than the good correctly apprehended by reason. On this account, it is possible to lead someone to act in accordance with my desires or in opposition to their initial desires without manipulation. A parent who attempts to convince their child to prioritize education over drugs is not manipulating them, but attempting to correct their misguided will by reorienting their reason to the good.
What distinguishes counsel from manipulation on this model is not a matter of whose will is served or what “vibes” they get, but whether the action is attempting to help them reason more clearly about the good, or misdirecting them from it.
Obedience
The difference between how these two frameworks conceive of obedience similarly diverges.
On the model you critiqued, obedience is about whose agency is increased and whose is decreased. To be obedient is to have your agency decreased in subservience to the agency of the one empowered.
Obedience, on this other view, is formative and grounded in humility rather than deference to power. It is not a good in itself, but is only a good insofar as it leads to greater apprehension of what is the case, and is thereby instructive. Obedience is good for a child, a mentee, a student, insofar as it makes one open to being guided by those who see more.
Obedience on the model you are critiquing looks like the parent who says “do what I say because I say so”. Obedience on this other model looks like a trainer who says “let me show you what I have learned”.
Conclusion
In every case, the folk picture you are critiquing reduces agency to a power struggle between competitive wills that can only be imposed from outside. It is a useful critique of external alignment.
But on this other model, these categories are a matter of formation and internal orientation. Or, if you will, internal alignment to what is good as it can be discovered by reason.