Self_Optimization comments on Ethical Design Patterns

Self_Optimization 4 Oct 2025 4:08 UTC
1 point
0
I was going to note that this seems to be the social-interaction special-case of policy utilitarianism (which I’ve used for years and would attribute to giving me some quality-of-life improvements).
However, from a quick google search it seems “policy utilitarianism” doesn’t exist, and I have no idea what this concept is actually called, assuming I didn’t make it up.
In short, it’s a mix of functional decision theory, (rule) utilitarianism, and psychology (and possibly some Buddhism), along with some handwaving for the Hard Problem of learning under bounded rationality (which I’d assert the brain is good enough at to not need an explicit algorithm for it in a human ethical framework).
To go into the details, we know from e.g. psychology that we don’t have full control over our “actions” on some conventional idea of the “object level”. This applies both to individuals (e.g. addictions, cognitive biases, simple ignorance constraining outcome-predictions, etc) and societies (as Anna discusses above through some of the objections to prioritizing AI safety regulations).
So, instead of being consequentialist over external actions, take the subcomponent(s) of your mind that can be said to “make decisions”, and consider your action space to be the set of possible transitions between policies over the output of that system, starting from whatever policy it’s currently implementing (likely generated through a mix of genetic inheritance, early-life experiences, and to a lesser extent more-recent experiences).
Everything outside that one decision membrane (including the inputs to that decision-making mind-component from other parts of your brain and body) is an objective environmental factor which should be optimized based on your current decision policy (or any recursively-improved variants it generates by making decisions over the aforementioned decision-policy space).
- I’m handwaving away “the part of your mind that makes decisions” because I don’t know that we can definitively narrow this down without perfect self knowledge, and I also think we can make a practical approximation from introspection which is good enough to get benefits from this framework
For computational efficiency purposes, we can model our actions over the partial-policy space rather than the total-policy space, as is done in e.g. symbolic planning, and identify policies which tend to have good outcomes in either specific or general circumstances. This naturally generates something very much like deontological morality as a computational shortcut, while maintaining the ability to override these heuristics in highly constrained and predictable circumstances.
Extending the above point on deontological morality, since there is no privileged boundary separating the body and the external world, collaboration under ‘policy utilitarianism’ becomes an engineering problem of which heuristics either constrain the behavior of others towards your utility function, or make you more predictable in a way that incentivizes others to act in a way aligned to your utility function. (For the moralists in the audience, note that your utility function can include the preferences of others via altruism)
In practice, humans generally don’t have the cognitive advantage over each other to reliably constrain others’ behavior without some degree of cooperation with other humans / artificial tools (or single combat, if you’re into that). As such, human-to-human communication and collaboration relies on all parties applying decision-heuristics which are compatible with each other on the relevant time-scales, and provide sufficient mutual predictability to convince all parties of the benefits of Cooperate vs Defect, without excessive computational burden on the broadcaster or the receiver.
- I suspect you could derive these constraints academically from signal theory and game theory, respectively, but I haven’t looked deeply enough to know the required axioms for such a proof.
Given the above two constraints, preference utilitarianism produces (at least in my eyes) the recommendation to design ‘ethical heuristics’ which are both intelligible to others and beneficial to your goals, and apply them near-unilaterally for social decision-making.
One useful ‘ethical heuristic’, given the above reasoning for why we want these heuristics at all, is sharing your heuristics with others who share (aspects of) your core values; this improves your ability to collaborate (due to mutual predictability), and if you trust your critical thinking then communities using this heuristic also mitigate any individual computational constraints on heuristic design by de-duplicating research work (P is ~cheaper than NP).
In service of these goals, the heuristics you share should not require significant investment from others to adopt (aka they should inherently contain ‘bridging’ components), and should be useful for pursuing the values you share with your interlocutor (so that they are willing to adopt said heuristics). Again, I don’t know if I’m handwaving too much of the intermediate reasoning, or misinterpreting Anna as calling out the general principle of engineered ethics when she really intends to specifically call out the heuristic in the previous paragraph; but as far as I can tell this produces the points in this article as a special-case.
Curious if anyone has encountered this idea before, and also if I’m misinterpreting Anna’s point in relation to it? (general critiques are welcome as well, since as mentioned I use the above principle myself)