I wouldn’t say that preference utilitarianism “falls apart”; it just becomes much harder to implement.
And I’d like a little more definition of “autonomy” as a value—how do you operationally detect whether you’re infringing on someone’s autonomy?
My (still very informal) suggestion is that you don’t try to measure autonomy directly and optimize for it. Instead, you try to define and operate from informed consent. This (maybe) allows a system to have enough autonomy to perform complex and open-ended tasks, but not so much that you expect perverse instantiations of goals.
My proposed definition of informed consent is “the human wants X and understands the consequences of the AI doing X”, where X is something like a probability distribution on plans which the AI might enact. (… that formalization is very rough)
Is it just the right to make bad decisions (those which contradict stated goals and beliefs)?
This is certainly part of respecting an agent’s autonomy. I think more generally respecting someone’s autonomy means not taking away their freedom, not making decisions on their behalf without having prior permission to do so, and avoiding operating from assumptions about what is good or bad for a person.