Here, the optimal decisions would be the higher-order outputs which maximize higher-order utility. They are decisions about what to value or how to decide rather than about what to do.
To capture rational values, we are trying to focus on the changes to values that flow out of satisfying one’s higher-order decision criteria. By unrelated distortions of value, I pretty much mean changes in value from any other causes, e.g. from noise, biases, or mere associations.
In the code and outline I call the lack of distortion Agential Identity (similar to personal identity). I had previously tried to just extract the criteria out of the brain and directly operate on them. But now, I think the brain is sufficiently messy that we can only simulate many continuations and aggregate them. That opens up a lot of potential to stray far from the original state. This Agential Identity helps ensure we’re uncovering your dispositions rather than that of a stranger or a funhouse mirror distortion.
What constitutes utility here, then? For example, some might say utility is grounded in happiness or meaning, in economics we often measure utility in money, and I’ve been thinking along the lines of grounding utility (through value) in minimization of prediction error. It’s fine that you are concerned with higher-order processes (I’m assuming by that you mean processes about processes, like higher-order outputs is outputs about outputs, higher-order utility is utility about utility), and maybe you are primarily concerned with abstractions that let you ignore these details, but then it must still be that those abstractions can be embodied in specifics at some point or else they are abstractions that don’t describe reality well. After all, meta-values/preferences/utility functions are still values/preferences/utility functions.
How do you distinguish whether something is a distortion or not? You point to some things that you consider distortions, but I’m still unclear on the criteria by which you know distortions from the rational values you are looking for. One person’s bias may be another person’s taste. I realize some of this may depend on how you identify higher-order processes, but even if that’s the case we’re still left with the question as it applies to those directly, i.e. is some particular higher-order decision criterion a distortion or rational?
This seems strange to me, because much of what makes a person unique lies in their distortions (speaking loosely here), not in their lack. Normally when we think of distortions they are taking an agent away from a universal perfected norm, and that universal norm would ideally be the same for all agents if it weren’t for distortions. What leads you to think there are some personal dispositions that are not distortions and not universal because they are caused by the shared rationality norm?
Officially, my research is metaethical. I tell the AI how to identify someone’s higher-order utility functions but remain neutral on what those actually are in humans. Unofficially, I suspect they amount to some specification of reflective equilibrium and prescribe changing one’s values to be more in line with that equilibrium.
On distortion, I’m not sure what else to say but repeat myself. Distortions are just changes in value not governed by satisfying higher-order decision criteria. The examples I gave are not part of the specification, they’re just things I expect to be included.
Distortion is also not meant to specify all irrationality or nonoptimality. It’s just a corrective to a necessary part of the parliamentary procedure. We must simulate the brain’s continuation in some specific circumstance or other and that brings its own influences. So, I wouldn’t call a higher-order criterion a distortion even if it gets rejected. It’s more like a prima facie reason that gets overruled. In any case, we can evaluate such criteria as rational or not but we’d be doing so by invoking some (other unless reflective) higher-order criteria.
For the most part, I don’t believe in norms universal to all agents. Given our shared evolutionary history, I expect significant overlap among humans but that there’d also be some subtle differences from development and the environment. It may also be worth mentioning that even with the same norm, we can preserve uniqueness if for instance, it takes one’s current state into consideration.