k_polym

Karma: 0

k_polym 9 Jun 2026 16:31 UTC
1 point
0
on: Empowerment, corrigibility, etc. are simple abstractions (of a messed-up ontology)
While it doesn’t solve your problem, I think a clearer distinction between preferences and plans would somewhat narrow the issue, and clear up some of the mess around manipulation vs. counsel and the like. For example, if we focus on the prediction and planning phase and consider preferences constant, assume we have agent 1 discussing their plans with an AI, or with another agent 2 in general. Here, the difference between manipulation and honest counsel from the AI is easier to pin down: if the AI is providing a truthful representation of how it expects each scenario to play out, it’s helpful counsel. If the AI is distorting the predictions it presents to agent 1 in order to get agent 1 to choose particular actions the AI prefers, it is manipulation.
I mention this because some of your examples, like providing more information, seem to be affecting predictions rather than preferences. In my opinion, the conventional intuition is that preferences are outside of the agent’s control, and free will enters in how we choose to act given these preferences (even if going against them). Following the hunger signalling analogy, I would consider eating every few hours part of the agent’s preferences, over which the agent does not have control. Given these preferences, the agent uses its free will to choose its actions.
Interacting with other agents to refine our predictions, and indirectly our actions, is a lot safer than anything touching our preferences directly. I think at a minimum I would want to give explicit consent before another agent tries to change my preferences directly, even if that is to my benefit, and even if I would choose to allow the agent to do so. As an example, eating burgers tends to be more satisfying than eating salads. If an agent could change my preferences so that I enjoyed salads more than I enjoy burgers now, and nothing else, I’d want to give my permission before they did so. It’s actually likely that I would give my permission for this, but I’d want to be asked first!
I think as a general rule of thumb I would support something like this:
- when it comes to refining predictions, honesty from the AI’s side is enough. As long as the AI is accurately describing its predictions for the outcomes of my actions or plans, I don’t consider myself manipulated if, after interacting with an AI, I change my plans. My free will is not diminished.
- when it comes to affecting my preferences, since some of them are part of my personality, explicit consent is needed. Having my preferences altered without my consent amounts to diminishing my free will.
It’s possible that you are perfectly aware of this distinction and that the post was referring strictly to preferences, not predictions or plans at all. If that’s the case, it might be helpful to work into the ontology some version of the planning process: considering various courses of actions, predicting their outcomes and evaluating them based on your preferences before deciding which to follow.